Well, hacking implies that I know how to do that sort of thing… I’ve been following tutorials and copying code from Stackoverflow…
I’ve thought it would be awesome to use Pokemon Mystery Dungeon DX on the Switch to practice some Japanese reading. However, the game has quite a lot of text compared to childrens’ novels and other more low-level reading material, so it requires a more advanced mastery of vocabulary to read.
Basically, I suck at playing the game because I can’t read/understand a lot of the vocabulary used…
But I had an idea! What if I hack my Switch, dump the game, scrape the dialogue, run the dialogue through a morphological analyzer, and compile a list of vocabulary and kanji to study before playing the game?! Easy peasy! Well, sort of…
I won’t bore you with the details, but I’d like to share a list of vocabulary and kanji I’ve extracted! So here’s what I have so far! It’s ~700 kanji and ~3000 vocabulary from just the dialogue of PMD DX. I believe the kanji is mostly clean, but the vocabulary is mixed with a lot of onomatopoeia and items which should probably belong more to functions of grammar than vocabulary. I haven’t taken a good look yet, but I expect around 20% of the vocabulary to be extra stuff not necessary to study. Take a look at what I’ve got here:
The next step is to pair each vocabulary and kanji with its Japanese reading and English meaning. Unless I find a better method, I’ll probably use Jisho to do most of this automatically, then manually hunt down definitions and readings for those pesky words Jisho doesn’t find. Then I’ll be able to create an Anki deck or something of that sort to study out of!
Let me know what you think! Oh, and here’s another gif. After getting through so many reviews, you deserve it! You didn’t skip your reviews to check the forums like I did, right?
Hmm… That’s an excellent question! I guess I could do a more detailed write-up/tutorial if more people indicate that they’re interested? Otherwise, we could just message on Discord (my tag is Xeyler#0892) or something of that nature for more personal help.
Keep in mind that it’s not an exact science yet: I can extract kanji easily, but extracting vocabulary is a little more difficult and error prone. I wrote pretty much all the scripts I used in Python which means they should work on Windows, but the process is kind of biased towards Linux at the moment. There isn’t one catch-all method of dumping dialogue from Switch games, so it’s not feasible to do en masse. And most importantly, I haven’t even tried studying with this material yet, so it’s difficult to say how successful we were in this attempt to scrape and process text.
Most of these issues can be fixed with a little time, impatience, and profanity, though. I just have to spend some time with it! I’ll try to stay motivated here!
Anyways, thanks for asking! Most people fear cringe code and spooky scripts enough to not even mention it.