Here comes a new challenge... err!


#1

みなさん、こんにちは!

Recently, Hacker News featured this new language learning tool named Clozemaster. It looks interesting and kind of cool but I’d love to know your opinion because this Sacred Community is the most animated, enthusiastic, passionate and psyched as much as hard-hearted, merciless, ruthless and severe when it comes to language learning tools. I crave your opinion, my respected fellow Crabigator followers.

For your convenience, I’ll copy right here the very own words of the creator right from the post he created to promote his child:


Hi Hacker News! My name’s Mike. I’m the creator of Clozemaster.

Clozemaster is language learning through mass exposure to vocabulary in context. The goal is to fill in the missing word in a given sentence. The missing word is the most difficult word in the sentence according a frequency list for that language, and the sentences are from the awesome dataset at Tatoeba.

I started the site just under a year ago to answer the question “what should I do after finishing Duolingo?”. Since then it’s grown to support over 50 languages, mobile apps, and thousands of users. It’s useful for learners at almost any level, from beginner to advanced, and makes a great complement to Duolingo, textbooks, classrooms, etc. to practice vocab in context.

You can play the Fluency Fast Track which gives you a sentence for each unique missing word in order of increasing difficulty, jump in to sentences grouped by frequency from the 100 most common words to the 10,000 most common, or just play random sentences. There’s also “cloze-listening” - hear the sentence first, then see it and fill in the missing word.

Thought Hacker News might find Clozemaster interesting and hopefully useful! It’s still very much a work in progress - I have a bunch more features planned and I’m working to improve it all the time. I’m also open to any feedback and happy to answer any questions!


Since it’s a new language learning tool bursting or squeezing into this language learning tool wild world, market, habitat, contest, or whatever I strongly feel it’s the same as Here Comes A New Challenger like when playing Ultra Street Fighter II Turbo. What are your impressions? Do you think it’s a strong contestant as a language learning tool? Is it worth to give it a try? Would you consider using it? The Fill In The Gaps thing does really help?

As for me… I’m fascinated about its kind of roguelike nature as I read in the About section:


Where are the sentences from?
All sentences and translations are from tatoeba’s massive and awesome dataset, released under a CC-BY License.

How are the blanks in the sentences selected?
The cloze deletion to test, or the blank in the sentence, is the least common word in the sentence within the 10,000 (or as many as possible) most common words in the language. In other words, for a given sentence all the words in the sentence are checked against the top 10,000 words in a fequency list for that language. The least common word is then used as the cloze test. In this way the vocab learned via clozemaster is the most difficult of the most common. If a quality frequency doesn’t exist for given language, one is either generated from the sentences themselves or random cloze words are selected.

Where are the frequency lists from?
The frequency lists used are from https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists, except for: The frequency list for Japanese was generated from JLPT vocabulary lists on http://www.tanos.co.uk/jlpt/. All of the vocabulary is sorted by JLPT level, and higher precedence is given to kanji.

The frequency lists have been modified to remove common names and pronouns.


Don’t get my wrong: I do love lots the beauty of the manually crafted WaniKani and the priceless imprint of the genius behind but I also have faith in clever algorithms and auto-generated content based on random stuff from god knows which source.

Please, share your opinion!

ありがとうございます!


#2

Cloze activities are a tried and true, time honored tool among language teachers to teach, and more often assess, vocabulary use in context. Clozemaster’s explanation pretty much covers it. That said, don’t rely on the multiple choice too much. It could be good for getting your grounding, but the text input is the truer test of what you’ve learned. After spending a few minutes with the text input, I can see some scope for error on the part of the test if the person who wrote the “correct” answer didn’t add all possible ways of answering (synonyms, alternate conjugations, or vocab that can be written with different kanji). Looks like a neat little tool, though. Thanks for sharing!

EDIT: After spending a few more minutes on the text input, yeah, it’s pretty finicky. Counted me wrong for using 切符 because it wanted チケット, also counted 本当に as incorrect for “Really?” because it wanted 本当, where as in actual use either on is acceptable. So, Clozemaster could get annoying…


#3
daveyyoung said... (...) After spending a few minutes with the text input, I can see some scope for error on the part of the test if the person who wrote the "correct" answer didn't add all possible ways of answering (synonyms, alternate conjugations, or vocab that can be written with different kanji). (...)
Yep! That's absolutely true! 

Thanks for your insight, Davey Youngさん!

#4
daveyyoung said...EDIT: After spending a few more minutes on the text input, yeah, it's pretty finicky. Counted me wrong for using 切符 because it wanted チケット, also counted 本当に as incorrect for "Really?" because it wanted 本当, where as in actual use either on is acceptable. So, Clozemaster could get annoying...
:O

I see! Thank for proving that. Maybe it's helpful to collect as many impressions, improvements and bugs and forward them all to the tool creator.

#5

Haha, I just got the exact same Qs as daveyyoung and…got them right due to reading his post. However, I got others wrong for the same reasons he mentioned. Very difficult to get this right since a lot of words don’t translate perfectly from Jp - Eng so very often you’re going to have many correct options and it’s just a lottery as to whether your choice matches theirs.


#6

Well, I’ve only used it for 3 rounds so far, but it seems like something that’s piqued my interest and I’ll continue to use and explore it. The kanji only input is slightly offsetting. At WK lvl 10 it’s not all that bad, unless I move faster in clozemaster than I do in WK I suppose, but in general, it is a significant hindrance to those with limited or no kanji knowledge. Too bad it’s not like KaniWani where you can insert kana and get it correct as the kana gets converted to kanji. 

Now, I do understand that I have the Microsoft IME at my disposal, so pressing space bar often fixes my problems, but it’s not that satisfactory of a solution. If I know the kanji, that’s one thing, but otherwise I’m left guessing how many times to mash ‘space’ and then an unfamiliar kanji might start to be learned the old fashioned rote way. This might also be weird when you learn that kanji in WK if its mnemonic ties in with that of other kanji/vocab.

Overall, I’ll keep using it for the time being, but the range of options/input and reliance on Kanji aren’t all that appealing.


#7

Where are the sentences from?
All sentences and translations are from tatoeba’s massive and awesome dataset, released under a CC-BY License.

uwaaa〜 i wont try to be offensive but tatoeba  contain weird unnatural sentence for japanese〜 use with caution with any project using tatoeba data〜 im not too encouraged people for using it with any project served tatoeba example sentence〜 here the example where you can find weird sentences using tatoeba example sentence


#8
minamixdrops said... Where are the sentences from?
All sentences and translations are from tatoeba's massive and awesome dataset, released under a CC-BY License.

uwaaa〜 i wont try to be offensive but tatoeba  contain weird unnatural sentence for japanese〜 use with caution with any project using tatoeba data〜 im not too encouraged people for using it with any project served tatoeba example sentence〜 here the example where you can find weird sentences using tatoeba example sentence
 Yeah, this is another problem. It's a shame because, as far as I'm aware, 90+% of their sentences are perfectly correct, but there are just enough with errors or unnatural usage that you have to be very careful you don't end up learning the wrong thing. Easiest way to see what it's like is to skim through their sentences in your language. For example, by searching random English sentences I very quickly got "Please call me whenever it is convenient to you." Not wildly wrong, and you'll still be understood if you say it - but not correct either.

#9

Visceralさん! You’ve decided to adopt the tool pretty fast! I hope it goes well as time passes by. Maybe it’s kind of difficult to answer correctly if an specific word written exactly as expected is needed. Maybe, it’s better to go first for the multiple choice and later on go for the same questions without hints. I’m not sure…


#10
minamixdrops said... Where are the sentences from?
All sentences and translations are from tatoeba's massive and awesome dataset, released under a CC-BY License.

uwaaa〜 i wont try to be offensive but tatoeba  contain weird unnatural sentence for japanese〜 use with caution with any project using tatoeba data〜 im not too encouraged people for using it with any project served tatoeba example sentence〜 here the example where you can find weird sentences using tatoeba example sentence
riccyjay said...Yeah, this is another problem. It's a shame because, as far as I'm aware, 90+% of their sentences are perfectly correct, but there are just enough with errors or unnatural usage that you have to be very careful you don't end up learning the wrong thing. Easiest way to see what it's like is to skim through their sentences in your language. For example, by searching random English sentences I very quickly got "Please call me whenever it is convenient to you." Not wildly wrong, and you'll still be understood if you say it - but not correct either.
I wasn't aware of this! As far as I know about tatoeba, it's the large database of sentences used by Tangorin. When I'm asked to write a composition as homework I often rely on Tangorin for sample sentences for words I'm not sure how to use properly. But now I'm too scared to continue doing so despite it worked well for me until now!

I checked the About page for Tangorin where all 3rd-party libraries are listed and it says the following about tatoeba:

---
Tatoeba: The project behind the Examples dictionary. A large database of example sentences translated into several languages. Based on example sentences compiled by Professor Yasuhito Tanaka and his students at Hyogo University, known as The Tanaka Corpus, the project is currently maintained by Tatoeba and released under a CC-BY licence. 
---

It looks like Tanaka-sensei mischievous students made some big mistakes... Thank you for pointing out!

#11

There’s more about the weird origin of the tatoeba sample sentences. I’ve checked this and it says the following:


Professor Tanaka’s students were given the task of collecting 300 sentence pairs each. After several years, 212,000 sentence pairs had been collected.

From inspection, it appears that many of the sentence pairs have been derived from textbooks, e.g. books used by Japanese students of English. Some are lines of songs, others are from popular books and Biblical passages.

The original collection contained large numbers of errors, both in the Japanese and English. Many of the errors were in spelling and transcription, although in a significant number of cases the Japanese and English contained grammatical, syntactic, etc. errors, or the translations did not match at all.


There’s more to be read about this but your experience and this excerpt is enough for me to stop considering any project relying on this corpus. I’m not sure it’s the best approach possible but it doesn’t look good enough. Also, deriving sentences pairs from some weird sources reminds me of the infamous Google Translator.


#12

Yeah, when I first read “the sentences are from the awesome dataset at Tatoeba,” I immediately became wary.


#13

Hmm, after reading your discussion I’m kinda torn on Clozemaster… it’s so convenient and it’s (dubiously) good reading practice. Maybe I’ll stick with it for a month or two to get more comfortable with different sentence patterns. For Esperanto it’s pretty good!