Life is going to be extremely complicated through April, so I wouldn’t expect any more progress for a bit after this, but I have done some work this morning that I wanted to share with y’all.
I combined all the vocab data that I want to use (in what I guess will be my version of this extension) and I’ve got it to a permanent public host. In case anyone wants at it, it’s here. I’ll be updating it in that location as I move forward, but it shouldn’t be any problem to take on the bandwidth costs from that, as it’s S3, so it’s very cheap!
Also, I’ve been playing with the data. I had this curiosity about what the data would show if I took one of the corpora and built a graph demonstrating the distribution of the ratings of all of the words in a particular WK level, and then stringing them all together. So here’s what it looks like:
It’s a bit hard to read, but you can generally see that the higher rating words become fewer and fewer as you go up in levels. It seems like there is a real turning point at around level 20-25 where it turns from being mostly 3 star and greater to being mostly less.
Let me know if you find this stuff interesting and I can post other corpora’s data.