How would one go about making a browser tool that obscures words on a page for writing practice

Basically, I’m imagining something where you click a button or use a hotkey and a number of words (which you can modify in settings) are obscured and replaced with kana so you can see them in context and try to write them.

This would be invaluable to me and I would probably use it all the time. It’s incredibly labor intensive to create flashcards with sentences, and you have to create them yourself, so they don’t have any element of surprise. Problems created in kanji drill books are usually fairly short and contrived. They serve a purpose, but I want to be able to do this on anything, from news sites, to blogs, to twitter.

I’ve never created a script or extension or anything. And I’m not asking for someone to do it for me, but I know a lot of people here have experience with these kinds of things, so it would be helpful if you could point me in the right direction, etc.

Thanks.

(PS if something like this exists and I’ve just completely missed it, that would obviously be helpful too, haha)

5 Likes

Hmm…

Obscuring is easy enough (see [Userscript]: Hide Context Sentence Translation for a very well-done and useful example from @rfindley ).

I don’t know how I’d go about automatically converting kanji to kana, but if you can prepare the content before hand, you could have the same sentence twice, once kana-only and once obscured by default with the kanji.

1 Like

I think the main difficulty would be determining what set of characters combine to make words and what the expected readings for those words are. It seems like you’d have to have an entire dictionary and parser included in the extension/script to achieve what you want.

2 Likes

Well, aren’t there already extensions that will put furigana on everything for you? They might make mistakes every now and then, but the stakes are low here.

There are extensions that show you the possible meanings and readings for the word you highlight. Those don’t try to guess the right reading and just give you all possibilities. Either way, the problem is that a new extension would be starting from scratch, unless one of those extensions (e.g. Yomichan) is open source and can be forked as a starting point. If there is an extension closer to what you described that can be forked, even better.

Yeah, I know about Yomichan and stuff. I’m thinking of stuff like this (just a quick google result, there might be others).

It seems like it might be possible to just have something like that on and then just block out some kanji randomly. (For me, I wouldn’t even mind just blocking out all kanji, but other people might want to be more selective)

1 Like

If that furigana extension applies a consistent markup to the kanji elements, you could use some pretty simple Javascript to hide all of the kanji elements (or a random selection) until they’re clicked.

3 Likes

I just dumped
核兵器の保有や使用などを禁止する「核兵器禁止条約」を批准した国と地域が50か国に達し、国際条約として発効することが確実となりました。
into ichi.moe with kana selected, and got

かくへいき の ほゆう や しよう など を きんし する「 かくへいき きんしじょうやく」 を ひじゅん した くに と ちいき が ごじゅっかこく に たっし、 こくさいじょうやく として はっこう する こと が かくじつ と なりました。

but it just turns everything into kana, which is not quite what you are looking for

2 Likes

It’s indeed not what I said I was looking for, but it might work for me personally. It wouldn’t have any of the features you’d want in a full-fledged extension for people to make use of, but it would be good enough for what I want to do. Thanks!

3 Likes

I have found it to be pretty accurate with kana, but you will have to find out how well it does for your needs.

1 Like

I think it’s probably quite difficult to fully implement. The actual substitution of the kanji by kana is probably the easy part. If you’re OK with something like a simple userscript, you could in theory just scan the entire text in the body of the page, scanning the entire page for any text nodes, taking their text out, figure out the kana and substitute the text on the page by the kana version by simply setting the original text node’s contents to the new kana version. Whether or not you decide to substitute everything or only a specific dictionary doesn’t really matter from a technical standpoint there, that’s just a case of figuring out what to leave in. It might mess up the page layout a bit as the kana version will probably take up a lot more space than the kanji version.

The hardest part of the process is actually just the language itself. Figuring out the readings from a pure text string is quite hard, especially since a lot of Japanese text doesn’t really have clear spaces in it to mark word boundaries. You’d essentially have to parse the entire text or sentence, and try to figure out the words used based on the grammar present, which is quite difficult to do reliably. What makes it even worse is that Japanese websites are notoriously more difficult to extract a full coherent text from than most English websites. Sites using furigana will often have their text internally broken up from an HTML standpoint (e.g. ()む will usually be encoded as <ruby>読<rb>よ</rb></ruby>む), meaning parsing out a full word might even be hard. To solve this you could probably handle ruby and rt tags as special cases and substitute them by their rt content, although it still makes parsing the text a lot more difficult. A lot of Japanese websites I’ve come across also have a tendency to use images for banners or other site headings, which would be nearly impossible to parse normally, although this would not be too different from the problems most other Japanese language support plugins currently have.

Your best bet would probably be what @seanblue described earlier, finding an open-source plugin that does something similar, and seeing if you can at least fork the parsing part. If you have something that can parse a webpage, simply adding something to block out a set of kanji will probably not be too hard. I searched around a bit and found https://github.com/kuanyui/Furiganaize, which seems to be an open-source furigana plugin for firefox. The text_to_furigana_dom_parse.js file seems to handle the actual kanji substitution, and you could probably use it as a starting point, although from what I can tell it does still lack a lot of features the larger furigana plugins seem to have, and the code documentation isn’t that great. The code is small enough that you could probably write a toy example of the plugin you want to write relatively easy.

Thanks for you thoughts.

So yeah, doing a thing where just some of the words are blocked out will probably be tough, but I personally don’t need that. If I could do it, then more people might be interested in using it, but I would be fine with just strings of all kana. In that case, it would just be a matter of making anything that isn’t hiragana or katakana the same color as the background / physically removing it, whatever the case may be.

I’m not sure if you’re looking to do live conversion of text or offline preparation beforehand.

If the latter, it’s easy to create two versions of each sentence. One with kanji and no furigana, the other with furigana but hidden by default.

1 Like

Something like that could probably be written quite fast. In that case you could probably get away with just walking the DOM tree of the body of the webpage, and filtering out any text nodes. If you encounter any kanji (or any kanji not in a list, in case you want to at least leave some kanji in), you could just wrap them in something like:

<span class="hide_kanji">漢字</span>

And inject a <style> tag at the end of the <head> section containing something similar to:

.hide_kanji {
    color: rgb(0, 0, 0, 0);
}

.hide_kanji:hover {
    color: inherit;
}

Which would make any kanji fully transparent unless you hover over them. Injecting a large number of spans into the webpage will probably be incredibly inefficient, but it should work in theory at least.

1 Like

Have you considered using text to speech functionality? It’s not quite the same as your original request, but if you’re okay with entire strings of kana I thought maybe you’d be okay if it was listening-based as well.

2 Likes

That’s a good idea too.

1 Like

If you end up making something with TTS I know a trick for getting free access to Google Translate’s API

There were extensions like the Wanikani Highlighter (level dependent) and Genki/Wanikani Kanji Highlighter (level independent) which could highlight all kanji on a page however they are not updated to V2; don’t know if any of authors are active here. Nonetheless, if it did work I think one could just unpack the extension and change the highlighter color to solid black or something and maybe a furigana extension could do the rest of the work.

1 Like

To get the furigana you can use Kuromoji. There is also a JavaScript port available, which you could use in the browser directly: https://github.com/takuyaa/kuromoji.js

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.