ChristopherFritz's Study Log

The author of the code was able to fix the issue.

My manga reading time is about to reduce while I do minor cleanup of auto-generated subtitles (minor text fixes, timing fixes, etc.):

The portions I’ve checked thus far have been 100% accurate, and I didn’t have to spend $3,000 on a GPU. Took about 45 mins to run one episode through whisper.cpp.

6 Likes

My daily manga reading continues to be minimal as I spend time doing random posts for my site (recent postings being on 水の泡になる, 油を売る, and one I had been wondering about for some time now, イヤリング vs ピアス), as well as the occasional subtitle cleanup.

On the subtitle front, I’ve found Whisper to be about 95 to 99% accurate.

It’s interesting to see the difference between transcriptions using the large and medium models, including variation in:

  • timing
  • word recognition
  • where lines are split
  • use of kanji versus hiragana versus katakana
  • spacing and punctuation (or lack thereof)

Here are a few examples. The yellow subtitle on top is from the medium model, and the white subtitle on the bottom is from the large model when running the video through Whisper:

ノミ? のみ? Now I find of wonder if there’s a common preference for one or the other. When fixing up the subtitle for a line like this, I’ll probably end up spending time trying to find an answer for that… (I haven’t yet checked the line for accuracy.)

Listening to the audio, I can’t tell for sure which is correct. Sounds like 「力を持ってる」. But either way, the line sounds like nonsense to me.

The English line is “It’s the dungeon for you, boy”, and the official Japanese subtitle file (which is meant to convey the English script, not act as closed captions for the Japanese script) gives me, 「(ろう)にぶち()んでやる」.

Now that I know (ろう) is a word, 「(ちから)を」 is clearly actually 「地下牢(ちかろう)」. Subtitle file duly updated.

Suddenly, the line actually makes sense: 「さあ、地下牢(ちかろう)()ってるぞ小僧(こぞう)

It’s quite a bit of time and effort to spend on one line out of over 1,000. It’s not like I’ve ever encountered 地下牢 before [1] and I’m sure it doesn’t come up in anything I plan to read[2], but I don’t mind immersing myself in Japanese a bit extra like this.

Errors-to-fix aside, I hope Whisper perfectly trascribed the cave’s lines, because I can’t make out a word it’s saying otherwise.

(Here’s the audio.)


  1. image ↩︎

  2. image ↩︎

5 Likes

It works with medium and large models, and decently fast as well. For 24 min audio => 30 min for medium model, and 50 min for large model.

However, timing is messed up. lol, it is fixed very quickly, like within an hour.

It will probably be tomorrow that I see results of both models, though.

I used WSL2, and created a bash script from Windows to run it. It seems like trouble to make it build natively for Windows.

#!/usr/bin/env bash
 
workdir=$HOME/Projects/whisper.cpp
$workdir/main -l ja -m $workdir/models/ggml-medium.bin -t 12 -ovtt -osrt $@
2 Likes

With that time, are you manually fixing it using a subtitle editor? That’s what I’m doing with Aegisub.

For the anime Petite Princess Yucie, I now estimate I’m seeing about 95 to 98% accuracy using the large model. (The more subtitles I review, the higher the rate of errors I’m seeing it.)

For the Japanese dub of Disney’s Aladdin, the error rate is a bit higher, so maybe 85 to 90% accuracy? I can’t really say if medium or large has done better for me here.

I really hope to be able to compile a Windows build because there are a few Windows users I know of who are having issues with running the Python version of Whisper. I guess if building it for Windows was easy, someone else would already have put Windows builds online!

1 Like

The repo maintainer fixed the repo himself, so git pull and make.

I had been using Aegisub as well, because that’s the program I knew of from long ago.

Then recently, the reason I reached out to SubtitleEdit, is because this program can edit VTT directly, the direct output of openai/whisper. (While Aegisub seems unable to.)

Aegisub has better UI, probably…

However, whisper.cpp can output SRT directly, so that might not matter.

VTT may be easier to be edited directly by a text editor, though. (Line numbers are optional in this format.)

I didn’t fix the output or manual re-timing this time. Actually, I am at least tempted to, because of too long line lengths.

2 Likes

One “problem” with whisper.cpp, SRT, and Aegisub for me is that whisper.cpp produces SRT line this:

00:00:44.000 --> 00:00:49.000

However, Aegisub expects a comma, not a period.

00:00:44,000 --> 00:00:49,000

It’s an easy fix, though. I just open the SRT in a text edit and for Find/Replace to change all . to , and then Aegisub can load it just fine.

Then from Aegisub I save it as Advanced SubStation Alpha format.

2 Likes

I originally misunderstood what you meant about fixing in one hour, so my reply was probably confusing. I understand now!

It’s interesting how some things I run through, the timing is very good. And other things I run through, it comes out off by a lot. Well, it’s still better than nothing!

1 Like

whisper.cpp got a wrong format, it’s easier the change the code in main.cpp (and more robust than simple text replace too).

Probably in Line 265 of /main.cpp of function output_srt, change to

fout << to_timestamp(t0).replace(8, 1, ",") << " --> " << to_timestamp(t1).replace(8, 1, ",") << "\n";

Also, Aegisub is good for re-timing, when keyboard shortcuts and audio spectrogram are get used to.

3 Likes

It’s so easy, I didn’t even have to do anything. I didn’t even have to post it as an issue!

2 Likes

Saturday morning: “My copy of the new Pokemon games won’t arrive until tonight, so I’ll start coding a new project.”

Saturday evening: “Pokemon games arrived. Time to play them.”

Saturday night: “I really want to get back to that coding… I’ll continue Pokemon Sunday.”

A couple of hours past midnight: “This is a good place to stop with the code. Time for sleep.”

Sunday morning: “Back to where I left off with the code.”

Sunday evening (almost): “I’ve made really good progress with this code.”


As much as I like what I was able to get together with my Google Sheets document that builds my book club threads for me, it has always had major issues that cannot be worked around. And those issues created enough friction that I stopped using it after completing the club for the first volume of Shadows House.

Recently, I put together a web page that lets me populate some data, and then it saves it to a file for me to put on my website.

Saturday I pondered, “Why not use some of the same concepts for my book club builder?”

Things get technical from here. Now’s the time to click away to something else to read. Don’t say I didn’t warn you.

I could:

  • store the data as a JSON file
  • load it into a web page
  • click a button to copy the markdown text for posting a new thread or weekly post
  • if needed, modify values on the web page and save them

Sure, I could be reading manga or playing Pokemon, but when there’s a coding project that will better streamline things for me…


Thus my coding began.

First, I took one of the book clubs I’m running and put its information into a JSON file:

My long-term plan is to not have to touch JSON directly, and instead, do everything on the web page.

I’ve never worked much with Javascript (preferring to make web pages that work well without), and I was completely unfamiliar with methods to load a file from a local computer onto a web page.

The options look to be:

  1. prompt for a file to open
  2. drag and drop

I prefer the latter, as it’s faster for me.

And I was able to find a helpful post online showing how to put a red square on a web page, and allow dragging a file onto it.

But that red square was a bit painful on the eyes, so I generated an AI artwork on Takagi to drag a file onto:

image

Dragging the JSON file over to Takagi loads data from it.

Series-level data:

Volume-level data:

(I only have volume three in the JSON file, but with more volumes, I can select the volume to show on the left.)

Templates:

From here, I can:

  1. make changes and then save them
  2. copy a volume thread’s markdown
  3. copy a week’s thread/comment’s markdown

image

There’s still a lot of work to be done, but I’m liking my progress so far.

I do want to get back to coding (after coding almost non-stop for the past five and a half hours), but I’ve got some Pokemon waiting for me.

8 Likes

Recent milestones for my latest project:

  1. Code is available on GitHub.

Once it reaches a point where others may benefit from this project, I plan to host a copy online as well. (This will remove the need to download a copy, as that’s a barrier to entry.)

  1. Basic code for generating a vocabulary spreadsheet is in place.

The weekly template will likely save me five minutes per week on book club duties.

The volume template will save a fair deal of time on top of that.

But the real time saved will be in generating a vocabulary spreadsheet within a few clicks. That’s including separate tabs for each chapter.

For now, I’m looking forward to seeing how tomorrow’s book club posts go.

From there, for the first time ever, I’m almost looking forward to doing new volume threads, with new vocabulary sheets.

It’s only “almost” because I haven’t yet implemented conditional formatting to the spreadsheet-generation macro.

Well, it’s not like I have a new volume thread coming up anytime soon. Just in eight days. I know what I’ll be focusing on next for coding.


Recent Japanese study progress: 0%

But I have been getting a little reading done. (I’m falling behind on Orange again.)

9 Likes

I feel like the more I implement (formatting and conditional formatting when auto-generating vocabulary sheets), the more I need to implement (settings and user interface to select which formattings to apply).

4 Likes

Ah waw, I had seen the sheet before, cool to see how it’s done!

4 Likes

I’ve been fairly lax on my reading lately. Between working on coding and playing Pokémon (in English), reading progress has been fairly bad. I’ll probably consider joining the next season’s daily reading challenge to really get back into it, beyond minimal daily reading.

On the plus side, slacking off a bit at the end of 2022 means it’ll be that much easier to beat my “most volumes read in one year” in 2023.

For 2022, I’m currently sitting at 76 volumes complete. Considering 2021 totaled 64 volumes read, it’s a good increase!

My goal for December is to finish up every volume I’m in the middle of, except for book club volumes (unless they’re due to complete in early January, in which case I may finish them off at the end of December).

This leaves me with the following to focus on right now:


On the coding side of things, I’ve made good progress at getting implemented what I need to get implemented, and I’ve continued to be as bad at user interface as I always am.

Screenshots

Main interface and Series values:

Volume values:

Chapter values:
image

Week values:


(Still need to fix whatever’s causing the widths to be crazy on this one.)

Template values:

One of the most important things for this, if others are to consider giving it a try, will be having good default templates. That’s also last on my to-do list to work out before reaching version 1.0.

Vocabulary values:
image

6 Likes