While studying kanji, and Japanese in general, I’ve been tinkering with code to create a tool that will help me learn. What I have in mind isn’t, as far as I can tell, similar to anything that I could find on the Internet so far. At the moment, it mostly involves automatic analysis of big data sets, to extract from them the information and patterns I need.
As a source for the enterprise, I’ve been using Kanjium, Kanji VG, JMDict and tatoeba so far. This is because these resources present data sets that have Creative Commons or similar license; specifically, it is permitted to use them as a source in a commercial project - and I have in mind that with the work I am planning to put into the project, it would be nice if it could perhaps someday pay the bills.
But these resources are often described by Wanikani forum residents as flawed and tainted with incorrect information, and sentences no Japanese speaker would utter. As a shining example, Japanese-to-Japanese resources are given, presumably put together by professionals, vetted etc. In general, in the latter stage of learning, using Japanese-to-Japanese resources is preferred, in order to cut ties with the source language from which we are learning, and learn real dependencies and connections between words and characters in Japanese. Therefore I would prefer to use Japanese-to-Japanese resources, like weblio, kanjipedia and presumably many others.
So far, my Japanese skills aren’t good enough to figure out whether any of these resources, however, present any data set which I could use freely in a commercial enterprise. And there may be better resources around, which, with my limited ability to read in Japanese, I am entirely missing.
So, a question (which might be useful to other people as well, if they want to find such sources): does anybody know of Japanese-to-Japanese data sets, available under a license which allows using them as a source in commercial applications or websites?
Or available for licensing, after paying a fee low enough that it wouldn’t leave commercial application unfeasible? And not requiring personal visit to the institution in question, or ability to communicate with them in fluid Japanese?