Deeproot: Sousho AI Reader

Have you ever tried to read super squiggly Japanese calligraphy and failed miserably only to find out its some wacky thing called 草書 that even Japanese people struggle to read? Not really? Well, lets just imagine that you have…

In the second iteration of Vanilla’s adventures learning how to make websites and things, I present Deeproot.vision. Deeproot is a convolutional neural network that can classify 1826 different kanji written in full cursive (sousho). Just upload a picture and crop the kanji you want it to identify. It can do up to 4 at once in case you have some yojijukugo you need identified. Since most people probably wouldn’t be able to tell if the model is right or not with its predictions, I have included 10 sample images for each prediction that the model was trained on.

Even if you don’t have an image you need deciphered, you can go ahead and gave the demo image (令和) a try. Similar to my last one, this project was developed for educational purposes so I can improve my skills and learn new things. I appreciate all feedback from regular peeps who have design and feature feedback or from coding people who have suggestions on how to clean up or improve the source code.

Coding stuffs

So again the website is made with flask and vanilla JS (its my name after all). There’s really not much to say from a coding standpoint here because the site is very straightforward. I learned react and was initially planning on using it for this site, but decided it wouldn’t be a good fit and plan on using it for my next one which seemed more appropriate. Not sure if that ended up being an accurate assessment or not, but here we are. The source code for my project is up on my github here: GitHub - Jacob-HS/SoushoReader

Machine learning stuffs

The model for Deeproot is still undergoing changes and I just went ahead and made the site so I can swap the model out for improved versions as they come along. The model is a CNN made with pytorch. Current model architecture is as follows:

Architecture

Input: 64x64 GRAYSCALE

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
KuzuConvoModelV5_DROPOUT                 [1, 1826]                 --
├─Sequential: 1-1                        [1, 32, 32, 32]           --
│    └─Conv2d: 2-1                       [1, 32, 64, 64]           320
│    └─ReLU: 2-2                         [1, 32, 64, 64]           --
│    └─Conv2d: 2-3                       [1, 32, 64, 64]           9,248
│    └─ReLU: 2-4                         [1, 32, 64, 64]           --
│    └─MaxPool2d: 2-5                    [1, 32, 32, 32]           --
├─Sequential: 1-2                        [1, 128, 8, 8]            --
│    └─Conv2d: 2-6                       [1, 64, 32, 32]           18,496
│    └─ReLU: 2-7                         [1, 64, 32, 32]           --
│    └─Conv2d: 2-8                       [1, 64, 32, 32]           36,928
│    └─ReLU: 2-9                         [1, 64, 32, 32]           --
│    └─MaxPool2d: 2-10                   [1, 64, 16, 16]           --
│    └─Conv2d: 2-11                      [1, 128, 16, 16]          73,856
│    └─ReLU: 2-12                        [1, 128, 16, 16]          --
│    └─Conv2d: 2-13                      [1, 128, 16, 16]          147,584
│    └─ReLU: 2-14                        [1, 128, 16, 16]          --
│    └─MaxPool2d: 2-15                   [1, 128, 8, 8]            --
├─Sequential: 1-3                        [1, 1826]                 --
│    └─Flatten: 2-16                     [1, 8192]                 --
│    └─Dropout: 2-17                     [1, 8192]                 --
│    └─Linear: 2-18                      [1, 1024]                 8,389,632
│    └─Dropout: 2-19                     [1, 1024]                 --
│    └─Linear: 2-20                      [1, 1024]                 1,049,600
│    └─Linear: 2-21                      [1, 1826]                 1,871,650
==========================================================================================
Total params: 11,597,314
Trainable params: 11,597,314
Non-trainable params: 0
Total mult-adds (M): 163.94
==========================================================================================
Input size (MB): 0.02
Forward/backward pass size (MB): 3.70
Params size (MB): 46.39
Estimated Total Size (MB): 50.11
==========================================================================================

The model was very unresponsive to regularization techniques despite horrible overfitting at the start of experimentation, but as I increased the dataset size (to 300,000+) and class amount (from 600 → 1800) dropout in the classification layer began to help quite a bit. I still have not completely given up on augmentation, but I have found 0 success with it over many different runs. Dropout in convolutional layers also didn’t seem to work very well for this particular model.

I used ADAM at first, but nowadays I have been using SGD, typically starting with a learning rate of .01 and reducing that down to .001 for finer tuning once the loss decrease rate begins to level out. I haven’t done any functional learning rate annealing using the annealing functions available in pytorch due to how often I was changing architecture. I also previously would preprocess all images to be white on black by using thresholding techniques, but due to the wide variety of images out there, some where the kanji is already nearly impossible to read, I felt a lot of the images were poorly processed. I had given up on thresholding for the most part, but then I went ahead and gave otsu global thresholding and it actually worked really well for the images I saw. It wasn’t able to surpass plain grayscale unprocessed, but I still think there is some tinkering I can do with some of the numbers to maybe make it more viable. Batch sizes of 8 yielded best results in the beginning, but as data size increased 32 became my go to. Thats the general outline of the ML stuff regarding this project that comes to mind, but theres a lot of finer details I didn’t go into. Any curious people can ask questions directly and I’ll answer to the best of my ability.

14 Likes

Does it work with normal kanji we want deciphered? I tried it with a picture I conveniently have, but it didn’t work.

No, the model was only trained on 草書, so anything you show it that isn’t that won’t be accurately classified. Theres enough data available for it to be possible, but I thought current solutions were already fine for it and didn’t want to decrease 草書 classification accuracy so kept 行書 and 楷書 and all other handwriting out of its training (to the best of my ability).

6 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.