(Back up) Floflo.moe - A WK-friendly website for reading

Raionus · June 7, 2018, 2:30am

Furigana is meant for human consumption not machine consumption. This is because (in books) furigana is written so that the pronunciation for each character centers on top of each individual kanji, rather than over the entire word. The result is that a word like 奥底 reads like this to a machine:
奥おく底そこ which is gonna parse as 奥・おく・底・そこ

As opposed to the way we’d see it like: 奥底・おくそこ

Yeah, ripperoni. You needed a space after the 2. Like “2. (はた)”

I’ll try and change that soon. There’s no practical reason why it should distinguish between "2. " and “2.”

Jeehut · June 7, 2018, 4:09am

Wow, the idea for this site is amazing!
And the most compelling thing for me is: It includes my favorite book Harry Potter!
This is actually the book my goal is to be able to read and understand as quickly as possible. And this site will probably help me out with that a lot, thank you for that!

Having this said, I still don’t understand quite some things and there might be some problems when I will try to use it. Specifically:

In the preferences it tells me, my WK API key will only work as long as I don’t delete my cookies. But my browser is configured to reset all cookies each time I start up. Does this mean, I will have to enter my WK API key every day when I want to do my reviews?
If I understood the values in the following image correctly, then this means that the Book Harry Potter contains 7006 different words and with my configuration to only show words that appear at least 10 times there are 943 words found with that minimum frequency. Of those I seem to know (943 - 663 = …) 280 already through WaniKani. This would be only about 30% of those frequent words and not even 4% of all words. Did I understand this correctly?

I know the book is rated “hard” but it’s the only book that I’m motivated to read (and have already bought and lying around, waiting to be opened). Since my current level ok WK is only 11 the question for me is: When will I have pleasure reading the book and not looking up 80% of the words on a page? Will I need to learn all those unknown words with frequency >= 10 first? Or to put it differently: What percentage of words in total are covered by these 943 words of the entire book? It must be more than 13,4% since 943 out of 7006 is already 13,4% but those words appear at least 10 times according to my settings. It would be much more helpful to know how many percent of the total words (not unique words, repetitions included) I would be able to read once I get to know all words with frequency >= 10. And in addition to that, it would of course also be very useful to know what percentage of total words (not unique words) I’d be able to read with my current WK level. That should also be more than 4% since I probably learned more frequent words at the moment.
I’m pretty sure it would be more motivating if I could see all those statistics I can find for the entire book with finer granularity. Say for example for every single chapter so I can concentrate on vocabulary in chapter 1 first, then once I learned that I go open my book and actually read chapter 1. Next I start vocabulary for chapter 2 etc. – that would be awesome!
Am I missing something here or is there really no possibility to mark a book as “I’m trying to read this” somehow and add all of its vocabulary to my lessons pile? Of course with point #4 above with adding not the entire book but the chapter first would be more useful. But at least a button “Add all” would be nice so I don’t have to click every single word.
I know one can probably do this manually somehow, but I think it would be useful if this was integrated by design: Wouldn’t it make sense to combine point #5 (marking a book as “Reading goal”) with a step-by-step automatic decrease of frequencies? What I mean is: At the beginning the frequency is set to 100 automatically (which gives me 30 unknown vocab as of now) and my goal is to master that – let’s call this Level 1. Then it’s set to 50 (gives me 81 unknown) which would be Level 2 and so on: Level 3 at 30 (182 unknown), Level 4 at 20 (302 unknown), Level 5 at 10 (663 unknown), Level 7 at 5 (1403 unknown), Level 8 at 3 (2312 unknown), Level 9 at 2 (3389 unknown) and finally Level 10 at 1 (6288 unknown). The site could set my initial level to where I have at least 50 unknown items. Then you could give those levels names which state how hard or easy it is to read the book once this level is reached, for example, Level 1 would be called “Impossible”, Level 3 may be “Painful”, Level 5 “Managable”, Level 7 “Pleasant”, Level 8 “Fluent”, Level 9 “Native” and Level 10 “Linguist”. I think such a feature would be amazing, what do you think?

Sorry, this has gotten a little long, but I hope it’s some useful feedback and you will be able to address some of the points until I’m at level 20-ish and actually start using the site. Currently, my initial level would be close to “Impossible” but I’d rather start somewhere between “Painful” and “Manageable”. So I’m gonna wait a bit.

Naphthalene · June 7, 2018, 4:10am

Weird… what about words where the pronunciation is independent from individual kanjis? Like 躊躇う

Raionus · June 7, 2018, 4:17am

That centers around the whole word

Naphthalene · June 7, 2018, 4:18am

By the way, some of the easy texts are freely available online. You can start with those. Obaasan to kuroneko is relatively short as well

Naphthalene · June 7, 2018, 4:21am

Okay, sorry I was getting sidetracked.
What I wanted to ask is if it appears at all in the parsed file (with whatever markup) or if it’s removed prior to parsing.

Jeehut · June 7, 2018, 4:23am

Thanks for the pointer! I’m already using SatoriReader though which is a paid service and therefore quite a little bit more feature-rich than this site, I’d rather stick to that for short texts. They don’t have Harry Potter over there though and it’s probably never gonna be a thing – that’s why I’m really looking forward to Flo*Flo.

Raionus · June 7, 2018, 4:23am

I remove it myself so that it doesnt interfere with the parse. Even if it wasn’t formatted weird like that parsers would still treat it as a separate word though because they just go through the file linearly.

Raionus · June 7, 2018, 4:34am

Your api key won’t be saved but your level will. If you’re not saving cookies it means that you just need to manually update your level on Floflo whenever you level up in Wanikani.

Unfortunately, a lot of the actual SRS preferences are kept at a cookie level as well. I personally like the default settings but that’s just something to keep in mind.

You are correct. It’s a difficult book. Most kid’s novels stick around 3000-4000.

Probably never, to be blunt. The reality most people find with reading is that it’s difficult to do for pleasure until you hit a level where you stop having to have a dictionary on hand. Additionally, most people learn better from easier books. Therefore they just read easy books that they’re not actually that interested in since they learn faster that way and it’s a freaking task regardless.

If you want an actual statistic, about 40% of the unique words in a book have a frequency of 1. If you want to never look up stuff in the book you’ll have to learn most of the 7000 or whatever words which will obviously take forever.

I don’t keep labels at the moment but words are in order of appearance so you’ll end up doing that anyway. I don’t sort by frequency because it’s not useful for helping people read.

There’s a lot of reasons why. The most relevant one for you, as a beginner though, is that once you add an item (as a lesson or otherwise) to your account it no longer appears on any other book’s vocabulary list on the site.

That means if you were to mass add 6000 words from harry potter and then visited another book, it would not show those 6000 words. You would then no longer be getting a useful vocabulary list for that book which would undermine the goal of the website.

There’s a program called cb’s text analysis tool that is capable of doing that. Obviously, it doesn’t come complete with an SRS but anyway, learning by frequency does not help you read a book. I’ve tested it thoroughly and it just doesn’t happen.

My site tracks all the unique words in a book and puts them in the order they appear in the book. This means that you learn whatever words you need right when you need it and not sooner. It also means you can achieve 100% reading accuracy for those parts (assuming your grammar is up to par).

Anyway, sorry if I missed anything. If you have any more questions feel free to ask.

Jeehut · June 7, 2018, 4:39am

Thanks for the quick and thourough answer.
I can understand your poins and have to trust your experience on the last points.

Just one thing you might have missed:
I still think it would be useful to know the statistics of point #3 with the total percentage of words (not unique ones) I already know and optionally also those I would know when the current chosen frequency was mastered. Should be easy to calculate given the data you already have.

And yes, I do have a CS major by the way. Just in case you were curious.

Raionus · June 7, 2018, 4:46am

I ran the numbers so unless I messed up this should be somewhat accurate. I don’t count particles and some other stuff that I won’t get into

Total words: about 32.5k
Total words known at 10 frequency or greater: 21k

So there’ll be 11.5k ‘words’ you don’t know at that point or something like that

Jeehut · June 7, 2018, 4:50am

What I meant was actually to show those statistics on the site as a new feature. But thanks for looking them up for me! This means, I would have to still look up every third word, right? Interesting. I guess this is still “Painful” and not “Manageable”, like suggested in the level names above. ^^

Raionus · June 7, 2018, 4:55am

Well it’s nice statistic when you’re feeling curious but it doesn’t have much bearing on how well you’ll perform on the book which is why I just looked it up for you instead. I’ll just tell you from experience that you should be aiming for all the frequency 2 words if you want to read without a dictionary.

Jeehut · June 7, 2018, 4:55am

By the way, what I meant with a “pleasant reading experience” is still from the point of a learner, meaning that I would still expect to look things up quite often, but be able to finish a site within less than 10 minutes or so. Say if I only have to look up 3-5 words per site, that would be “pleasent” reading experience for Harry Potter for me. I just looked it up, with about 460 reading sites this would result in about 100 days until I would have read the entire book with 30 minute sessions every day.

Jeehut · June 7, 2018, 4:57am

As I just wrote, reading without a dictionary is not my goal with Harry Potter. At least not yet. And I don’t think it’s the goal of everone using your site, so I still think the statistics would be a useful addition.

Raionus · June 7, 2018, 5:00am

They most definitely can if they’re picking books of a proper difficulty level.

Anyway, the data just doesn’t exist to answer questions of this kind of specificity. My honest opinion is just to aim for something that doesn’t have 7,000 words in it.

Jeehut · June 7, 2018, 5:05am

I do – on SatoriReader. Maybe I’ll have a look again when I feel like SatoriReader doesn’t have enough content. But for now I can’t keep up with them, so there’s still lot’s of content to explore.

Well then, in that case why am I still writing here as it seems you think none of my suggestions are useful. I just tried to give feedback, it’s your choice in the end to ignore them though. I’m going back to my review pile …

flippantry · June 7, 2018, 5:40am

I haven’t kept up with this thread as much as I’d like to, but I think if I understand Dschee’s point correctly, I do think the percentage of words already known would be useful as it would help users understand where their difficulty level is.

For example, if I only know 50% of the vocabulary in a book, it would probably just be too tedious for my amount of patience to pick up, but if I could start with a book in which I know 90% of the vocabulary, it could be a great place for me to start so that I may work on improving my understanding of the grammar and pragmatics of Japanese, and also gain more practice reading quickly and testing my recall of the vocabulary I do (or should) know, while being able to control the number of words I would need or like to add to the lesson queue.

Especially as someone who is interested, but is trying to balance life and WaniKani already, I would love to be able to easily locate practice material that falls in a specific range for myself.

As an aside, I was wondering if you were interested in finding anyone to help work on this project, either with UI or front-end programming assistance. So that you don’t judge me by the terrible mockup I did earlier, I can provide sample works. I’ve been primarily studying and working with AngularJS recently, which I think could help you better organize and update your views. Not looking for any form of compensation, I just enjoy working on meaningful projects and don’t have much of an imagination for these ideas myself, nor the social skills to present any of them publicly.

I do think you have a fantastic idea, but I also think the UI is holding it back. Not just graphically, but from a usability stand point that most users won’t be able to offer concise feedback on.

Raionus · June 7, 2018, 5:49am

I actually quite liked your mockup, and I haven’t forgotten about it. Here I’ll send you a pm, not sure if anything will come out of it though.

Radish8 · June 7, 2018, 7:34am

Just wanted to confirm as someone who’s been using it for a few weeks - you definitely don’t want to add all the words from a book at once.

It feels like you want to find the book you’re interested in, add all the words and then learn them bit by bit before tackling the book. But if you did that you’d be missing the enormous value of the vocabulary list itself. Having a list of words sorted in order that they appear in the book makes for an incredible lookup service!

You can’t feasibly learn all frequency 2 / 3 words or whatever in a book before you start reading it at all, or at least it would be much harder because it would be so long before you ever saw any of them in context. You want to learn a chunk of them, read a bit while being able to look up and add frequency 1 words, and then pre-learn the next chunk.

This breaks up your reading and vocab learning experiences, lets you see words in context fairly soon after learning them, and only add frequency 1 words after you’ve already seen them in context.

Also, my vocabulary might be terrible in comparison to yours, but I’ve found so far that just over half of the new kanji I encounter are from levels 25 to 34, and half from 35 - 60. I’m finding this doable but it feels to me like the site is probably ideal once you’re roughly level 40, because then new kanji lookup will be significantly reduced. It just depends what you’re happy to spend your time on though.