Background of this post, from ABBC:
So, once upon a time, I wrote code to download images from Bookwalker.
Consider this item from Bookwalker:
Why did I write this?
I wanted to partake in all those fancy BookWalker sales, but I also don’t want to leave Kobo as I can remove DRM from my purchases.
Removing DRM is essential for me as:
- It lets me view my purchase on any device of my choosing (Kobo doesn’t have a Linux application).
- It lets me use Mokuro for OCR.
- I can use Mokuro’s output for all kinds of things that have helped with my learning.
Why did I never use this?
Manga images are stored in JPG format, which is a lossy format. The file’s size is decreased by storing less data, then using an algorithm to approximate what was there when viewing the image.
However, BookWalker doesn’t show the JPG image to the user.
Instead, they load it into an HTML canvas, where the original JPG data is inaccessible.
This means that when you save the image (which BookWalker blocks doing), you have a choice:
- Save the image as JPG for a smaller file size, but lose even more of the original information.
- Save the image as PNG to retain the full image information, but at a larger file size.
At that point, I’d rather continue buying from Kobo and receive the “original” JPG images for my purchases.
There’s also the issue that images are saved at the canvas size, which means there is whitespace to be cropped (although this can be automated post-download):
(My download method requires viewing pages with vertical scrolling rather than horizontal, but the canvas is still oriented for a wider dimension.)
Anonymous poll time!
Do I release my code for others to use for study purposes, such as if they want to create a PDF they can mark on?
- Do not release it. It’s too dangerous.
- Release it. Make it available for those who will use it for good. There are other, easier ways to remove DRM if someone wanted to do something bad with it.
- Doesn’t matter. The Crabigator probably won’t like it here.