There appears to be Youglish as well, but I am not sure if there is an API that can be used in UserScript? Also, this is not anime nor movie, but real speech (though, perhaps monologue).
Also unlike ImmersionKit,
Sound files cannot be extracted, to use in Anki for example.
Rewinding is easy, and audio is continuous, not chopped to segments.