ルリドラゴン ・ Ruri Dragon 🐲 (Absolute Beginner Book Club)

There are two separate aspects to my script that are worth considering:

  1. Having words auto-populate reading and English when typing in the Japanese.

  2. Having entries pre-populated.

Item 2 I think can be useful for ABBC because it ensures that words are available on day one each week.

One counterargument would be about the value of looking up words, but I think that time is better spent looking up grammar at ABBC level, and one can still try looking up words on their own if they want to.

An issue that will likely come up is the occasional mis-OCR or mis-parsing that results in an incorrect word on the spreadsheet. This may cause confusion, but maybe that wouldn’t be an issue if readers understand going in that there may be some entries for words that aren’t actually on the page, and they are free to correct them. In this situation, correcting the Japanese word would auto-update the reading and English.

The spreadsheet wouldn’t be 100% complete from the start as English entries will include multiple meanings, so there will be opportunities for readers to update the entries to list the best meaning in context.

If you’re interested in giving it a try, @TobiasW, and if you have any issues with my scripts, we can maybe coordinate over Discord (if you have an account there or don’t mind signing up).

Otherwise, I can squeeze in time to generate a spreadsheet later today or tomorrow if you’d like.,

5 Likes

I recommend maybe making it on the night of the 17th maybe. As far as I know, we’re in the same timezone, and with that, a lot of people, especially in the beginning want to start asking questions as early as possible. Which is of course hard, when it’s already the 18th for them and there’s no thread yet.

6 Likes

That would be lovely! To be honest, I’m super busy at the moment preparing for a trip to Copenhagen (friends are getting married) from Thursday to Monday, and I have a hard time even fitting in the preparations for the thread I’ll have to post while being there.

Afterwards I definitely want to give it a try though!

Fair point!

3 Likes

Have you at least tried asking them to delay it a few months? :wink:

I’ll be sure to get the vocabulary sheet together today and I’ll post a link to it here so you have it to include in the weekly threads.

8 Likes

I vote for the automatic sheet

2 Likes

I probably need to rewrite the guidelines a bit, but here we go:

This was put together with a bit of OCR, some automated parsing, some scripting, and a fair bit of manual clean-up as it’s a new process to build this.

That is all to say, there can be mistakes in the form of:

  1. This word is a compound word in the manga, but it’s split into two words in the vocabulary sheet. (Feel free to correct this to be one row with the compound word rather than two separate rows!)
  2. This word on the spreadsheet doesn’t show up on the page. (Likely an OCR error. This means pre-learning words based on the spreadsheet might give you a few extra words not in the manga. Hopefully this is rare!)

English translations appearing in gray text are auto-matched from Wiktionary, but these translations often have multiple translations. Feel free to type the correct one, which will cause it to change to black text.

13 Likes

I have been following and liking your discussions about this auto-generating vocab sheet from the beginning, and just want to say: You are a genius! This is so amazing! :sob:

4 Likes

I went over it a bit and I’m honestly amazed by how good it is.

The words appear with all possible meanings, so there is a lot of uneccessary information. But as was said earlier, I guess it will be easier to delete stuff in a pre-populated sheet rather than to add stuff from scratch.
I found already one parsing error (何→なんで), but it was easy to edit. The more people are using the sheet, the better mistakes like these will be detected and new beginners will have a much easier time.

Btw, @ChristopherFritz do you also intend to update/upload the frequency spreadsheet of this Manga to your website? I have been checking daily the entry for ルリドラゴン Spreadsheet, but it still says “pending” :slight_smile:

3 Likes

Yeah, this is the downside. It requires knowing the context to know which word, something code can’t do (until AI gets us there?)

The hope is that people fill in the proper word and it’ll be much quicker than typing all the values (kanji, kana, English). Plus, if you don’t know a word you can still see the English possibilities and put the right one in based on context.

I need to update my code for generating the ODS file to switch it from my old vocabulary parsing code to my newer. It’s simple enough to do, but I haven’t gotten to it yet since I’ve been using the web pages for tracking my known words rather than spreadsheets now. I’ll put that on my very-near-term to-do list!

4 Likes

Thank you so much! :smiley:

I’ve added it to the first post, with notes on how to use/improve on it. Let me know if there’s more I should add, or if you want any of it changed!

3 Likes

I am seeming to have a bit of difficulty navigating the way the pages are counted. It looks to me that the vocabulary listed aren’t even on the pages that are listed.

3 Likes

Are you reading the digital or physical version?

3 Likes

There is no difference.
But actually I was a bit stupid, it was the 関係 being on page 5 that I didn’t see, I only saw it on page 7 so it messed up my whole counting of the pages. Nevermind lol

4 Likes

I confused myself in exactly the same way when I initially confirmed the page numbers were correct after the main content was auto-generated.

3 Likes

When I first saw the sheet I was like “What the heck, is that 関係 a parse error, I don’t remember that at all”. Turns out the OCR is better at this than me.

5 Likes

Actually that was my first thought. It took a few minutes till I realized that it was referring to the chapter name. But I was still wrong on where I found it haha

2 Likes

To be fair, even the spreadsheet expressed confusion following 関係:

image

:wink:

8 Likes

I just started filling some stuff in, just wanted to ask if I’m doing it right, as I’ve never done this before.

BTW, ない isn’t on page 9 is it? Looks like it’s on page 10 at earliest

2 Likes

The earliest ない is actually on page 5 :stuck_out_tongue:

(And I can’t find it on page 9 either.)

3 Likes

For verbs, I personally like to include “to” such as “to walk”, the same as WaniKani does. But I don’t know how often others do that.

This was an error of the text parser. Hopefully one of few!

I went ahead and removed the line as ない isn’t used here.

Edit: The text parser has to be able to recognize the base form of words (such as ()こう => ()く). It probably saw a な and mistook it for ない.

4 Likes