Basically, I’m imagining something where you click a button or use a hotkey and a number of words (which you can modify in settings) are obscured and replaced with kana so you can see them in context and try to write them.
This would be invaluable to me and I would probably use it all the time. It’s incredibly labor intensive to create flashcards with sentences, and you have to create them yourself, so they don’t have any element of surprise. Problems created in kanji drill books are usually fairly short and contrived. They serve a purpose, but I want to be able to do this on anything, from news sites, to blogs, to twitter.
I’ve never created a script or extension or anything. And I’m not asking for someone to do it for me, but I know a lot of people here have experience with these kinds of things, so it would be helpful if you could point me in the right direction, etc.
(PS if something like this exists and I’ve just completely missed it, that would obviously be helpful too, haha)
I don’t know how I’d go about automatically converting kanji to kana, but if you can prepare the content before hand, you could have the same sentence twice, once kana-only and once obscured by default with the kanji.
I think the main difficulty would be determining what set of characters combine to make words and what the expected readings for those words are. It seems like you’d have to have an entire dictionary and parser included in the extension/script to achieve what you want.
There are extensions that show you the possible meanings and readings for the word you highlight. Those don’t try to guess the right reading and just give you all possibilities. Either way, the problem is that a new extension would be starting from scratch, unless one of those extensions (e.g. Yomichan) is open source and can be forked as a starting point. If there is an extension closer to what you described that can be forked, even better.
Yeah, I know about Yomichan and stuff. I’m thinking of stuff like this (just a quick google result, there might be others).
It seems like it might be possible to just have something like that on and then just block out some kanji randomly. (For me, I wouldn’t even mind just blocking out all kanji, but other people might want to be more selective)
It’s indeed not what I said I was looking for, but it might work for me personally. It wouldn’t have any of the features you’d want in a full-fledged extension for people to make use of, but it would be good enough for what I want to do. Thanks!
I think it’s probably quite difficult to fully implement. The actual substitution of the kanji by kana is probably the easy part. If you’re OK with something like a simple userscript, you could in theory just scan the entire text in the body of the page, scanning the entire page for any text nodes, taking their text out, figure out the kana and substitute the text on the page by the kana version by simply setting the original text node’s contents to the new kana version. Whether or not you decide to substitute everything or only a specific dictionary doesn’t really matter from a technical standpoint there, that’s just a case of figuring out what to leave in. It might mess up the page layout a bit as the kana version will probably take up a lot more space than the kanji version.
The hardest part of the process is actually just the language itself. Figuring out the readings from a pure text string is quite hard, especially since a lot of Japanese text doesn’t really have clear spaces in it to mark word boundaries. You’d essentially have to parse the entire text or sentence, and try to figure out the words used based on the grammar present, which is quite difficult to do reliably. What makes it even worse is that Japanese websites are notoriously more difficult to extract a full coherent text from than most English websites. Sites using furigana will often have their text internally broken up from an HTML standpoint (e.g. 読む will usually be encoded as <ruby>読<rb>よ</rb></ruby>む), meaning parsing out a full word might even be hard. To solve this you could probably handle ruby and rt tags as special cases and substitute them by their rt content, although it still makes parsing the text a lot more difficult. A lot of Japanese websites I’ve come across also have a tendency to use images for banners or other site headings, which would be nearly impossible to parse normally, although this would not be too different from the problems most other Japanese language support plugins currently have.
Your best bet would probably be what @seanblue described earlier, finding an open-source plugin that does something similar, and seeing if you can at least fork the parsing part. If you have something that can parse a webpage, simply adding something to block out a set of kanji will probably not be too hard. I searched around a bit and found https://github.com/kuanyui/Furiganaize, which seems to be an open-source furigana plugin for firefox. The text_to_furigana_dom_parse.js file seems to handle the actual kanji substitution, and you could probably use it as a starting point, although from what I can tell it does still lack a lot of features the larger furigana plugins seem to have, and the code documentation isn’t that great. The code is small enough that you could probably write a toy example of the plugin you want to write relatively easy.
So yeah, doing a thing where just some of the words are blocked out will probably be tough, but I personally don’t need that. If I could do it, then more people might be interested in using it, but I would be fine with just strings of all kana. In that case, it would just be a matter of making anything that isn’t hiragana or katakana the same color as the background / physically removing it, whatever the case may be.
Something like that could probably be written quite fast. In that case you could probably get away with just walking the DOM tree of the body of the webpage, and filtering out any text nodes. If you encounter any kanji (or any kanji not in a list, in case you want to at least leave some kanji in), you could just wrap them in something like:
And inject a <style> tag at the end of the <head> section containing something similar to:
Which would make any kanji fully transparent unless you hover over them. Injecting a large number of spans into the webpage will probably be incredibly inefficient, but it should work in theory at least.
Have you considered using text to speech functionality? It’s not quite the same as your original request, but if you’re okay with entire strings of kana I thought maybe you’d be okay if it was listening-based as well.
There were extensions like the Wanikani Highlighter (level dependent) and Genki/Wanikani Kanji Highlighter (level independent) which could highlight all kanji on a page however they are not updated to V2; don’t know if any of authors are active here. Nonetheless, if it did work I think one could just unpack the extension and change the highlighter color to solid black or something and maybe a furigana extension could do the rest of the work.