Houhou 1.2 - Dictionary and SRS application for Windows

Doublevil · December 28, 2015, 4:52pm

Deathnetworks said... Did the audio url (http://assets.languagepod101.com/dictionary/japanese/audiomp3.php?kanji=%kanji%&kana=%kana%) change?

Apparently not, I can still get the audio clips in my browser. Can someone else confirm that audio still works on Houhou when configured with this url? (I'm at work at the moment but will check asap)

felipe-san · December 28, 2015, 5:22pm

Doublevil said...
Deathnetworks said... Did the audio url (http://assets.languagepod101.com/dictionary/japanese/audiomp3.php?kanji=%kanji%&kana=%kana%) change?
Apparently not, I can still get the audio clips in my browser. Can someone else confirm that audio still works on Houhou when configured with this url? (I'm at work at the moment but will check asap)

Works fine for me on Windows 10 and that URL.

Deathnetworks · December 28, 2015, 10:38pm

I’ll turn it off and back on again

Deathnetworks · December 28, 2015, 10:40pm

That did the job, note to self… actually close down HouHou now and again.

Yuzu · December 30, 2015, 6:11am

This looks fantastic I can’t wait to try it

ocac · January 6, 2016, 2:45am

Doublevil said...
ocac said... @Doublevil
Howdy! Way back when, we had a wee chat, and I learned that your 本 commonality marker (so useful!) was from your own study of books and not any other.
https://www.wanikani.com/chat/kanji-and-japanese/8689
I'm looking through the massive TXT of the frequencies that you shared (never realised you had shared it until I went looking for said file for a project of my own - thank you!). 'Tis wonderful!
One thing though: What lexical tool did you use? Especially, what dictionary was it parsed/de-conjugated via? I was guessing Mecab, as that's most common, but I've recently started using UniDic so I see that there are other viable options. JWParser, even.
I'm looking to parse more text and match it to these frequencies to judge the most useful (common) words, so I'd like to copy the type of parsing for compatibility's sake.

EDIT: Actually, looking at the format of the entries, I think I would have to ask you for a new version to use it.
The setup is [Number]|[Item]|[Reading], with no line-breaks (at least as it renders for me), so there's no spacing between the entries. ie. [N]|[I]|[R][N]|[R]|[N][N]|[R]|[N][N]|[R]|[N][N]|[R]|[N]...
Would you be able to change the format to C/TSV, or some other line-broken (or easily line broken) format? Would really appreciate it.
Hey ocac!
The tool I used is c hasen.

I'm not only sharing the file on WaniKani, I'm also sharing it with the rest of the world as it is included in Houhou's source code on GitHub.
There are Windows-format line breaks (\r\n). The format is:
<number of occurences>|<kanji reading>|<kana reading>
You should be able to use it as a | separated csv file already. If somehow you can't, I'm willing to transform it for you, it's easy enough. The upload would be by far the longest part.

If you just want to read it, you should be able to turn windows-style line breaks into your system's line breaks with any modern text editor. I'm using Notepad++.

You are great. If what I'm working on looks to be useful to others similarly, I'll try get it to them, yourself included. Certainly, it has benefited from the tools and work of others, greatly.
I uninstalled Notepad++ a long time ago for some bizarre, vengeful reason. Yeah, it totally managed the file-size (which notepad barely could) and the line-spacing (which Notepad couldn't at all). Thank you for prompting me to patch-up my relationship with this useful tool. ;-)
I realise I confused parser tool with dictionary - Mecab being the former, UniDic the latter. I have yet to investigate ChaSen as a tool (another one @_@), but simply knowing of it is great. It seems it can be used with a variety of dictionaries. The lesser Windows version (the only one available to me in my present setup) defaults to using "ipadic" which I think is the same one used in cb's JTAT. However, ChaSen seems to work with UniDic and JAIST dictionaries as well?
Do you know what dictionary you used for your frequency study? Even if I'm using a different tool/version, matching the dictionary is probably the most important thing.

PS: I'm now thinking that ChaSen could be great as cb's JTAT, the tool I'm presently parsing with, lacks some control. If I need help with ChaSen, might I get in touch?

Doublevil · January 6, 2016, 5:12am

ocac said...You are great. If what I'm working on looks to be useful to others similarly, I'll try get it to them, yourself included. Certainly, it has benefited from the tools and work of others, greatly.
I uninstalled Notepad++ a long time ago for some bizarre, vengeful reason. Yeah, it totally managed the file-size (which notepad barely could) and the line-spacing (which Notepad couldn't at all). Thank you for prompting me to patch-up my relationship with this useful tool. ;-)
I realise I confused parser tool with dictionary - Mecab being the former, UniDic the latter. I have yet to investigate ChaSen as a tool (another one @_@), but simply knowing of it is great. It seems it can be used with a variety of dictionaries. The lesser Windows version (the only one available to me in my present setup) defaults to using "ipadic" which I think is the same one used in cb's JTAT. However, ChaSen seems to work with UniDic and JAIST dictionaries as well?
Do you know what dictionary you used for your frequency study? Even if I'm using a different tool/version, matching the dictionary is probably the most important thing.

PS: I'm now thinking that ChaSen could be great as cb's JTAT, the tool I'm presently parsing with, lacks some control. If I need help with ChaSen, might I get in touch?

Ah good question. I can't remember the dictionary I used. I can tell you that I ran ChaSen in a linux virtual machine, and with some parameters to adjust the output to what I wanted, but that's about all. I'll be able to investigate some more this weekend.
If you want to get in touch, send me a mail at hello@houhou-srs.com. And if you want to chat on skype, line, etc, tell me your ids in the mail. :)

ocac · January 6, 2016, 5:36am

Doublevil said...
ocac said...You are great. If what I'm working on looks to be useful to others similarly, I'll try get it to them, yourself included. Certainly, it has benefited from the tools and work of others, greatly.
I uninstalled Notepad++ a long time ago for some bizarre, vengeful reason. Yeah, it totally managed the file-size (which notepad barely could) and the line-spacing (which Notepad couldn't at all). Thank you for prompting me to patch-up my relationship with this useful tool. ;-)
I realise I confused parser tool with dictionary - Mecab being the former, UniDic the latter. I have yet to investigate ChaSen as a tool (another one @_@), but simply knowing of it is great. It seems it can be used with a variety of dictionaries. The lesser Windows version (the only one available to me in my present setup) defaults to using "ipadic" which I think is the same one used in cb's JTAT. However, ChaSen seems to work with UniDic and JAIST dictionaries as well?
Do you know what dictionary you used for your frequency study? Even if I'm using a different tool/version, matching the dictionary is probably the most important thing.

PS: I'm now thinking that ChaSen could be great as cb's JTAT, the tool I'm presently parsing with, lacks some control. If I need help with ChaSen, might I get in touch?
Ah good question. I can't remember the dictionary I used. I can tell you that I ran ChaSen in a linux virtual machine, and with some parameters to adjust the output to what I wanted, but that's about all. I'll be able to investigate some more this weekend.
If you want to get in touch, send me a mail at hello@houhou-srs.com. And if you want to chat on skype, line, etc, tell me your ids in the mail. :)

Sending!

juanpra · January 12, 2016, 12:53pm

cant download it says that the file is currupted

BreadstickNinja · January 12, 2016, 1:18pm

juanpra said... cant download it says that the file is currupted :(

Try again. I just installed 1.2 on two computers yesterday and the download worked fine on both of them.

Doublevil · January 12, 2016, 7:39pm

juanpra said... cant download it says that the file is currupted :(

Try this link:
https://www.dropbox.com/s/sbc97yp2surfpkj/Houhou_Setup_1.2.exe?dl=0

rfindley · January 13, 2016, 4:52am

Hmm… SRS Import from WK doesn’t seem to capture vocab from levels 51-60. Kanji is okay, though.

Doublevil · January 13, 2016, 5:10am

rfindley said... Hmm... SRS Import from WK doesn't seem to capture vocab from levels 51-60. Kanji is okay, though.

Yeah it was done before the addition of levels 51-60 and the code needs an update only because WaniKani's API will kind of crash if you request all vocab, so you have to get it in multiple parts, by explicitly specifying the levels you want to get.

rfindley · January 13, 2016, 5:20am

Doublevil said...
rfindley said... Hmm... SRS Import from WK doesn't seem to capture vocab from levels 51-60. Kanji is okay, though.
Yeah it was done before the addition of levels 51-60 and the code needs an update only because WaniKani's API will kind of crash if you request all vocab, so you have to get it in multiple parts, by explicitly specifying the levels you want to get.

Ahh, thanks.
BTW... I discovered through experimentation that 5 levels at a time seems to be pretty optimal for fetching vocab. In the Ultimate Timeline script, I run 5-at-a-time queries in parallel, and it ends up getting everything usually in 5-10 seconds. Not sure why the backend seems to like that better.

Doublevil · January 13, 2016, 1:42pm

rfindley said...Ahh, thanks.
BTW... I discovered through experimentation that 5 levels at a time seems to be pretty optimal for fetching vocab. In the Ultimate Timeline script, I run 5-at-a-time queries in parallel, and it ends up getting everything usually in 5-10 seconds. Not sure why the backend seems to like that better.

Ah thanks, that's helpful. I don't know why it crashes either but Tofugu didn't seem to care last time I contacted them about the issue.

juanpra · January 13, 2016, 9:55pm

is there any way to reset houhou back to zero?

juanpra · January 13, 2016, 9:57pm

i imported my wanikani data but i accidentally duplicated everything and i don wanna be erasing words one by one

juanpra · January 13, 2016, 10:16pm

never mind i fixed it

ddog · January 13, 2016, 10:45pm

Thank you so much for making this great application. I would like to second the suggestion to add a lessons queue. What I currently do is I write down all the unknown vocab that I encounter while reading, then later I go and add it to Houhou all at once. This means like 50-100 new words at one time. So the next review session ends up being a ton of stuff I don’t know, which is really intimidating the first time around. My current workaround is to suspend all the newly added items and only unsuspend like 10 at a time, but I feel like there could be a better solution. Maybe I’m just doing something wrong idk.

Also, and this is maybe crazy, it would be great if there was a checkbox when you add a vocab item and it could say “I don’t know this/these kanji”. If you check that checkbox, HouHou would add all the kanji used in that vocab word to your review queue. Then, once those kanji get passed a certain SRS level, the vocab word gets released into your review queue (unsuspended) (kinda like Wanikani).
Alternatively, there could be a setting where you mark which Kanji you already know, and whenever you add a vocab word that has a Kanji you don’t know in it, it automatically does the above stuff.

Lastly, there seems to be a bug with my audio. It randomly says the wrong words when I do reviews. It will be words that I am reviewing, just not the one I just typed in. I turned it off because it was bugging me.

Thanks again, I love this so much more than anki…

Edit - also, is there a way to view only suspended items in the SRS items list?

Kaimera · January 14, 2016, 6:46am

Reading the WaniKani radicals in houhou, take for example ‘animal’ and ‘snake’ for the kanji 犯. Firstly, ‘cobra’ was renamed(?) to primarily ‘snake’ on WaniKani, thus does not appear in the radical search unless you search for the old name. Secondly, when selecting these two radicals which WK deems a match for this Kanji, the Kanji will not appear. It seems that the Kanji themselves continue to use the Kangxi(?)/Jim Breen breakdown, rather than the WK one. I understand this would be a time consuming thing to implement, but would be worthwhile I think. It would also be nice if (besides the XML file) you could rename Radical names.

Topic		Replies	Views
Please let us add our own content Feedback	57	2366	May 3, 2019
Do you want a WK-like tool for vocab learning? [Poll] Japanese Language	86	8229	July 6, 2019
What do you want now? (Request extensions here) API And Third-Party Apps	1873	105253	August 18, 2025
DEPRECATED TOPIC: English - Japanese vocabulary recall tool: KaniWani API And Third-Party Apps	473	66052	July 15, 2019
Torii - SRS learning application for vocabulary API And Third-Party Apps	899	61260	February 6, 2025

Houhou 1.2 - Dictionary and SRS application for Windows

Related topics