[Userscript] Voice Input using Web Speech API

gijsc1 · August 27, 2023, 10:40am

Not that I noticed, It just happens once every 20~ish reviews. I have yet to discover any kind of pattern, except that once it has decided it won’t work, it really won’t work, no matter how many times you say something for that word, until you do to the next question and everything is fine again.

I can’t remember this happening for any of the Japanese questions, so it may or may not be something specific to the English ones.
I am also stubbornly using the doublecheck lightning mode instead of the one that comes with this script, as it works much faster. I can imagine that is causing some race condition as well somewhere.

okonomichiyaki · August 28, 2023, 1:51am

This issue should be fixed now in this new version 0.3 github link

I’ve tested with several vocabs with numbers from levels 1-2 and they work. Please let me know if you find cases where it doesn’t work

There may be some issues still matching english numerals (eg “10 days” vs “ten days”) as the speech engine is not consistent. but these can be worked around with a user synonym in WK. I will eventually add code for this to work without a synonym

nmunshi · August 29, 2023, 2:28am

Hey, thanks for updating it so quickly!

I am trying to test it, but right now I am unable to get it to work almost at all. I am using pixel buds as my input, so that is a change from my usual laptop onboard microphone. But it shouldn’t be this bad.

Screen is an example. I’ve said ください 10x+ and in many different pitch accents, but it registers always as うださい . Is there anything I should do to calibrate my microphone better or is this a new bug?

udasai

EDIT: I am now noticing this is a bug, first part in Japanese sections always gets cut off. English parts are ok though.

EDIT2: Switched to external webcam mic and it’s working much better now. Issue is when using earbuds. Numbered days working much better now as well!

EDIT3: After 100 reviews, I believe v0.02 was much better. Even on the webcam mic, a lot of stuff was missed and often there was no input at all or the input was entered, but lightning mode wasn’t working (this is with correct answers too mind you). Here’s another screenshot of what I saw most of the time when the input failed me.

no input

Dunst · August 29, 2023, 9:47pm

Hi,

Absolutely love this thing. I have developed increasingly worse hand pain over the years, and being able to do my reviews with speech instead of typing is an absolute life saver. Not only can i continue my reviews even on days where i can barely touch a keyboard, i can even afford to put more words into apprentice at once as i don’t have to fear having to type too much anymore. Plus i learned something about my awful pronounciation while at it (apparently my T-s were so appaling that the engine kept turning them into チ instead, like they were so bad that they didn’t even deserve the hiragana).

With that in mind, i’ve put this baby through some pretty heavy duty work the past couple weeks (i had to get rid of an 800 review pile from a full week of abstinence, which kept generating multiple hundred reviews a day for a couple weeks), so i might have some more detailed feedback to share.

Most words with numbers don’t work, likely because the text to speech generates numbers (20日 instead of 二十日). English has the same problem, but is easier to work around because you can just spell words out.
The whole homonym-detection is pretty awsome, but it tends to break down every now and then. Most often this happens for numbers (e.g i havn’t been able to make enter 指示 yet as the engine always turns it into 7時), but i’ve had it happen for non-number vocabs before (e.g 上巻, when the engine replied 上官 - perhaps because 上官 is not in wanikani). Im guessing this could be aleviated by saying longer sentences that happen to end with the word (as i tend to do in english), but unfortunately for me, i don’t speak japanese very well
I know you can’t change it because the Text-To-Speech only gives you the kanji, but words with multiple pronounciations usually end up with a different one than the one you spoke (e.g あした → 明日 gets inputted as みょうにち). Not sure what to do here either
Every now and then, the auto-next won’t work. From my gut-feeling, it seems to be somewhat related to the (random?) pause between entering a correct word and going to the next slide on the last word, as well as the timing with which the answer on this vocab gets added. I’ve almost developed a gut feeling for it at this point. In such a state, the console will keep logging comparisions against the current word. Saying the correct word a second time will advance to the next vocab and trigger the shaking animation on it, as if it tried to click next on that one as well. Saying “next” will also work. This happens especially often on the final word of a lesson (i’d almost say its a 50/50 there). This happens to me multiple times each review, so if you have a hunch what could be going on, i can put a breakpoint on a line or two and see if i can find something out next time
Even more rarely (maybe once every 100 or 200 vocabs), the auto-next will trigger so fast that the voice wont even play. In these cases, there will be an error in the console log that reads:

Console error log

quiz_audio_controller-b99bc373bbdd70e5b9b2dd58598ca674060815cf094d5a50d16cac5f0068dc6a.js:1 AbortError The play() request was interrupted by a call to pause(). https://goo.gl/LdLk22
(anonymous) @ quiz_audio_controller-b99bc373bbdd70e5b9b2dd58598ca674060815cf094d5a50d16cac5f0068dc6a.js:1
nrWrapper @ review:7
Promise.then (async)
a.then @ review:7
play @ quiz_audio_controller-b99bc373bbdd70e5b9b2dd58598ca674060815cf094d5a50d16cac5f0068dc6a.js:1
didAnswerQuestion @ quiz_audio_controller-b99bc373bbdd70e5b9b2dd58598ca674060815cf094d5a50d16cac5f0068dc6a.js:1
nrWrapper @ review:7
submitAnswer @ queue-81b34c697fbadacc6471265390e753ac83575cb3c5d31e7689b098d75e52286d.js:1
submitAnswer @ quiz_queue_controller-70f86db7df271466ab70fc10a809a52e1d32bf939a94ea0eef8fbae476e48a0e.js:1
submitAnswer @ quiz_input_controller-358eeded710de05648dd74a58bb71c2fd81625d3ddc3be4de384e74207926c1b.js:1
invokeWithEvent @ stimulus.min-b3cf884595a44edb07aa9d2c524872c34e589a5df767d5f247f8a3746f21a50e.js:1
handleEvent @ stimulus.min-b3cf884595a44edb07aa9d2c524872c34e589a5df767d5f247f8a3746f21a50e.js:1
handleEvent @ stimulus.min-b3cf884595a44edb07aa9d2c524872c34e589a5df767d5f247f8a3746f21a50e.js:1
object @ review:7
nrWrapper @ review:7
handleKeyDown @ quiz_input_controller-358eeded710de05648dd74a58bb71c2fd81625d3ddc3be4de384e74207926c1b.js:1
nrWrapper @ review:7

Every now and then, the japanese recognition will break down for a while. In this state, it will be absolute dogwater at understanding just about anything, likes to add random にs and をs between words, and frequently produces empty strings even with no further voice input. Since the transcript sentences are already screwed up, (and since this pretty much never happens for english,) this is most defintely a problem on googles part; but it usually catches itself on the next word, so resetting it seems to be doing something. By far the best way to work with it in this state i’ve found is to repeatedly say もしもし until it shows up consistently in the transcript and then immediately follow up with the word with 0 pause between - for some reason, it will usually detect the word when you do that. Im guessing whatever google uses to do context recognition shit the bed in these cases and the whole もしもし spam puts it into a more neutral position or something.
Words containing both Kanji and katakana seem to be impossible to match. For example, the speech to text generates 可燃ごみ for the vocab 可燃ゴミ, which is not accepted
After doing ~100 vocabs in a single review (somewhere between 80 and 120 i’d say, exact number seems to vary), the plugin will start to fail for new words that did not show up in this review yet. In the old version before the microphone emoji, it would say something like “!Not in dictionary: 住人!” (warning: i have poor memory, might not be what is actually said) instead of the transcript whenever you said something. In the new version, it will show the correct transcription, but then not actually input them. At this point, you have to refresh the page for it to work again (which looses progress), or click wrap und and type the rest, so i got into the habit of clicking the wrap up button after about 50 lessons, just to be safe. The console will also not log its usually tries, though since 0.3 it will output an error message:

Error message

Uncaught TypeError: a is not iterable
    at i (userscript.html?name=wanikani-voice-input.user.js&id=e04551aa-5777-40a0-a0b0-7ded107b8688:2:2951)
    at userscript.html?name=wanikani-voice-input.user.js&id=e04551aa-5777-40a0-a0b0-7ded107b8688:2:23155
    at MutationObserver.h (userscript.html?name=wanikani-voice-input.user.js&id=e04551aa-5777-40a0-a0b0-7ded107b8688:2:23235)

Again, thank you so much for this script and the work you put into it. I really can’t understate how much this is helping me.

gijsc1 · August 30, 2023, 8:23am

These are very recognizable for me as well.

okonomichiyaki · September 1, 2023, 12:52am

Thank you very much for the detailed feedback, sorry for the delay but I only just saw this

I think there is some benefit to speaking practice, but I’ll just caution that this tool is not really intended to be used to test accurate pronunciation. I don’t think the engine is sensitive enough especially for nuances of pitch accent. Just a warning to not over interpret

Have you tried the latest version? If this one is still not working for you, please let me know. I will do another pass to test various vocabulary with numbers, but if there are some specific ones they do not work with the new version, it will save me time to debug

edit: I looked at 二十日 specifically. this is what you should see if it matches:
20日 (はつか)
if you see this alone:
20日
and never see the parenthesized reading, I wonder if you’re on the latest?

That’s interesting because I also can’t get the engine to recognize 指示 but in my case it hears “CG”

This is actually because 上官 doesn’t appear in the dictionary bundled with the plugin either. I have plans to build a fallback solution (look up using Jisho.org) eventually, but it may take time. In the meantime I may be able to add (optional) visual feedback if there’s no dictionary entry to lookup readings.

Yes unfortunately this is a fundamental limitation of the Web Speech API, at least when using Google’s engine. My code is not provided with anything more than「明日」literally without any indication of what might have been actually spoken.

Yes there are some bugs with the lightning mode. I will add configurations for the timings which may help find a balance between too fast and too slow. In the meantime you can try Double check lightning which I think has a configurable delay.

Yes there is definitely some kind of bug with the last card, I’ve experienced it also but it’s difficult to reproduce.

This may be helped with configurable timing, but I’ll also look to see if my code can hook into waiting for the recording to play

I can add a button to click to force a reset, which might be less annoying than having to use workarounds. The transcript should now be showing you exactly what my code receives, so if it really looks like junk there may not be much I can do but reset

This is both a bug, and another word that does not appear in the bundled dictionary, causing the issue. I can fix it with the next update

Another bug, but I think I know what’s going on here now that you mention the old error.

I’m really glad to hear that. I also suffer from pain from typing, which is why I built this. Thank you again for the detailed feedback

okonomichiyaki · September 1, 2023, 12:56am

do you mean to say that the old version was working better for you? the latest is 0.3

Dunst · September 1, 2023, 9:58pm

No sweat. Last thing i want is to making you feel pressured. Im just happy for any improvements that might materialise.

According to the comment in the tapermonkey script, its // @version 0.3. Though i never manually updated it, it kinda of did that by itself, so im not sure when 0.2 changed to 0.3.
I’m pretty sure i’ve seen those parantheses before on ocasion; however, the last number word i had with a confirmed 0.3 was 三十代 (the branspanking new one), and i couldn’t enter it with voice commands. I’ll pay more close attention next time a number word comes up.

I Downloaded Double check, but couldn’t seem to find the delay option - it always went to the next instantly. I tried to edit the script to add a fixed delay, and that kinda worked, but now the total counter for the remaining words is screwed up. Though it didnt get stuck anymore, so i guess it kinda works.

okonomichiyaki · September 1, 2023, 10:03pm

thanks, this helps narrow the issue down. my change for numbers in version 0.3 was too simplistic, I can improve it

sorry about that I may have misremembered

okonomichiyaki · September 3, 2023, 1:31pm

Here’s a new version (0.3.1) with a couple of bugfixes: github link

This includes fixes for:

better handling of numerals (eg 三十代)
better handling of kanji / katakana mix (eg 可燃ゴミ)
fix internal dictionary which may improve some matching of readings

I was able to reproduce the bug from a longer sessions breaking, but it’s going to take a little bit more time to fix

nmunshi · September 6, 2023, 4:51am

I think 0.3 is working well enough actually. I really love this add-on and use it everyday. Thanks for making it.

nmunshi · September 8, 2023, 3:32am

Common issue is numeric values inputted when a number comes.

Example:

Card: 五百 (asks for vocab meaning - purple card)

I say: 500

inputs: ごひゃく

Stalls

This is with lightning mode on and latest version.

okonomichiyaki · September 9, 2023, 2:47pm

I think this is a bug I have run into with other cards and I have a fix planned. but just to be clear, WK is asking for the meaning here in English, and you are saying “five hundred” in English, but the script is inputting Japanese, right? if so it’s the same issue and I will fix in the next update

nmunshi · September 12, 2023, 2:10am

Yes, that is correct.

Thank you.

okonomichiyaki · September 24, 2023, 11:22pm

Here’s another new version (0.3.2) with some more minor bugfixes: github link

This includes fixes for:

script tries to input Japanese readings when WK is asking for meanings (for example 500 and inputting ごひゃく when “five hundred” is spoken)
script gets stuck on the very last card of a review session
improve some handling of numbers (for example convert “10 days” to “Ten days”). there are cases where this doesn’t work well (for example “one’s 30s”) so user synonyms may still be needed for some cards

I had hoped to work on some of the other issues soon, especially the longer review sessions (>100 items) bug, however I haven’t had time. I still plan to try to work around that problem eventually

nmunshi · September 26, 2023, 1:34am

Thanks for the update! BTW, the link is routing me to v0.3.1, is it just me?

okonomichiyaki · September 26, 2023, 3:07am

sorry my mistake, here is the correct link.

nmunshi · September 29, 2023, 3:48am

Blazing fast now!

cmoncrab · October 4, 2023, 12:52am

@okonomichiyaki Thanks so much for the effort you’re putting into this! I’ve installed everything as listed in the instructions (on Chrome) but it’s not showing the yellow microphone input text. It works for one or two questions and then stops. Any ideas?

okonomichiyaki · October 4, 2023, 1:50pm

do you mean it never shows the live transcript? does it submit answers once or twice? or does it not even submit answers?

do you have any other extensions or WK scripts installed? can you try with only Open Framework and the voice input script, and also try with only the voice input script?

if you can check the console log for errors that might help. instructions here

Topic		Replies	Views
Userscript to accept kanji in answers to vocabulary? API And Third-Party Apps	8	416	August 3, 2023
Wanikani + Speech Recognition =? API And Third-Party Apps	31	4964	August 5, 2018
Is there a UserScript that reads the English out loud when you type the right meaning? API And Third-Party Apps	17	699	December 4, 2020
Speech to Text Extension for Inputting Wanikani Answers? API And Third-Party Apps	3	304	June 2, 2024
Is voice controll possable? API And Third-Party Apps	11	1320	April 28, 2020

[Userscript] Voice Input using Web Speech API

Related topics