Hi,
when studying (doing lessons) I try to read all example patterns of use and example sentences (turns out it GREATLY improve the retention :D).
However, while it may sound OK-ish for me I’m looking for something that would either record and compare it with the example or transcribe-back what I said so I could compare if at least “the computer” understood.
I don’t know exactly how it is implemented, but there is already a voice input script for doing regular reviews. I haven’t used it in a while, but it used to work great for anything longer than one or two syllables. You could look at the implementation or ask the creator for advice.
For a more low tech option, if you just want to check if your output is parsable without doing anything with the result, you could try to use the voice input mode of google translate.
Windows, and I assume most other operating systems as well, also comes with a build in speech to text application which you could use to transcribe what you are saying to any textbox.
Yes, this is the script I mentioned but it looks like it only works with the main item
Yes, this is what I’m looking for basically and what said Handy is doing
But would love to have something build-in (script in WK) or readily available (like Handy, which basically runs constantly in the background and I have a quick shortcut, and then it basically outputs what I said; yes, basically I dictated most of this reply with handy )
Yeah, I was doing that with Microsoft translate but it’s a bit cumbersome and disrupts a bit study flow (also, I try to avoid having a phone near me to avoid distractions )
I’m on macOS and it’s text-to-speech is absurdly terrible (hence using external tool)
Despite it’s being on MDN it’s not supported by Firefox
So yeah, to narrow it down: something multiplatform (with macOS support), working on computer or in a browser but not relying on SpeechRecognition API…
Maybe a LLM model that I could ask to be added to Handy?
I had made a STT deck for sentence shadowing and thought my iphone was a decent threshold for clean pronunciation and used it for text input (though they were common phrases). The only issue I’ve heard from native users is that when they have thick dialects, it tends to break. My understanding is STT has some context dependency and many WK sentences are intentionally absurd, though one would think common collocations should be ok (never tried it myself)….might be a good experiment if you know a native speaker for STT on the WK sentences and see what they get.