There are two separate aspects to my script that are worth considering:
Having words auto-populate reading and English when typing in the Japanese.
Having entries pre-populated.
Item 2 I think can be useful for ABBC because it ensures that words are available on day one each week.
One counterargument would be about the value of looking up words, but I think that time is better spent looking up grammar at ABBC level, and one can still try looking up words on their own if they want to.
An issue that will likely come up is the occasional mis-OCR or mis-parsing that results in an incorrect word on the spreadsheet. This may cause confusion, but maybe that wouldn’t be an issue if readers understand going in that there may be some entries for words that aren’t actually on the page, and they are free to correct them. In this situation, correcting the Japanese word would auto-update the reading and English.
The spreadsheet wouldn’t be 100% complete from the start as English entries will include multiple meanings, so there will be opportunities for readers to update the entries to list the best meaning in context.
If you’re interested in giving it a try, @TobiasW, and if you have any issues with my scripts, we can maybe coordinate over Discord (if you have an account there or don’t mind signing up).
Otherwise, I can squeeze in time to generate a spreadsheet later today or tomorrow if you’d like.,
I recommend maybe making it on the night of the 17th maybe. As far as I know, we’re in the same timezone, and with that, a lot of people, especially in the beginning want to start asking questions as early as possible. Which is of course hard, when it’s already the 18th for them and there’s no thread yet.
That would be lovely! To be honest, I’m super busy at the moment preparing for a trip to Copenhagen (friends are getting married) from Thursday to Monday, and I have a hard time even fitting in the preparations for the thread I’ll have to post while being there.
Afterwards I definitely want to give it a try though!
I probably need to rewrite the guidelines a bit, but here we go:
This was put together with a bit of OCR, some automated parsing, some scripting, and a fair bit of manual clean-up as it’s a new process to build this.
That is all to say, there can be mistakes in the form of:
This word is a compound word in the manga, but it’s split into two words in the vocabulary sheet. (Feel free to correct this to be one row with the compound word rather than two separate rows!)
This word on the spreadsheet doesn’t show up on the page. (Likely an OCR error. This means pre-learning words based on the spreadsheet might give you a few extra words not in the manga. Hopefully this is rare!)
English translations appearing in gray text are auto-matched from Wiktionary, but these translations often have multiple translations. Feel free to type the correct one, which will cause it to change to black text.
I have been following and liking your discussions about this auto-generating vocab sheet from the beginning, and just want to say: You are a genius! This is so amazing!
I went over it a bit and I’m honestly amazed by how good it is.
The words appear with all possible meanings, so there is a lot of uneccessary information. But as was said earlier, I guess it will be easier to delete stuff in a pre-populated sheet rather than to add stuff from scratch.
I found already one parsing error (何→なんで), but it was easy to edit. The more people are using the sheet, the better mistakes like these will be detected and new beginners will have a much easier time.
Btw, @ChristopherFritz do you also intend to update/upload the frequency spreadsheet of this Manga to your website? I have been checking daily the entry for ルリドラゴン Spreadsheet, but it still says “pending”
Yeah, this is the downside. It requires knowing the context to know which word, something code can’t do (until AI gets us there?)
The hope is that people fill in the proper word and it’ll be much quicker than typing all the values (kanji, kana, English). Plus, if you don’t know a word you can still see the English possibilities and put the right one in based on context.
I need to update my code for generating the ODS file to switch it from my old vocabulary parsing code to my newer. It’s simple enough to do, but I haven’t gotten to it yet since I’ve been using the web pages for tracking my known words rather than spreadsheets now. I’ll put that on my very-near-term to-do list!
I am seeming to have a bit of difficulty navigating the way the pages are counted. It looks to me that the vocabulary listed aren’t even on the pages that are listed.
There is no difference.
But actually I was a bit stupid, it was the 関係 being on page 5 that I didn’t see, I only saw it on page 7 so it messed up my whole counting of the pages. Nevermind lol
When I first saw the sheet I was like “What the heck, is that 関係 a parse error, I don’t remember that at all”. Turns out the OCR is better at this than me.
Actually that was my first thought. It took a few minutes till I realized that it was referring to the chapter name. But I was still wrong on where I found it haha