Common Word Combinations data

So I’m making a list of vocab that have common word combinations listed (other than in levels 1-16 which all have these). I’d like to make a script to pull vocab with this data, that way I don’t have to manually check each content update post… but I can’t find the property for word combinations in Wanikani’s documentation. Does anyone know what this property is called?

1 Like

Also to clarify, I’m looking to see whether a vocab item has word combinations yet, not the individual word combinations themselves. idk if the word combos are separate data points or are just a block of HTML/CSS so I’m not worrying about that.

So vocab from levels 1-16* would all be listed, as well as others such as 大幅, 残品, 係わる, etc.

*might make this a toggle since it’s given that those vocab would have collocations

I don’t believe that’s part of the API (the WK devs seem to have put the API on the back burner unfortunately…)

I think your only option will be to scrape the web interface.

2 Likes

^^ me right now :slightly_frowning_face:

I hope they get to adding it soon (though collocations have been around for nearly 3 years now…)

1 Like

Decided to just search the forums manually – both in content additions and content update threads – and it didn’t take as long as I expected! Not as efficient as scraping or using the API (and likely not as accurate), but it’ll do.

For others that find collocations helpful: here’s what I found!! :slight_smile: Many of these are either new additions or items that were previously in lower levels (and thus were included in the lower level collocation updates).

Updated 2024.06.07

Levels 17-20
Levels 21-30
Levels 31-40
Levels 41-50
Levels 51-60

Item Inspector Search Terms:

仮説,外来語,非常,繋がる,一昨年,世,埋める,得,求人,流石,通じる,重なる,繋ぐ,世の中,祈願,立ち去る,変更,対応,更新,脱衣,衣服,戸棚,更衣室,本棚,棚,脱税,脱線,脱走する,雨戸,更に,脱ぐ,今更,係わる,優れる,光景,問う,地,涼しい,現役,素直,絶対,絶望,肩こり,限る,刺す,担う,涼む,突然,絶景,過ごす,障る,客観的,憧れる,景観,残品,比較する,濃い,統一,費やす,全景,前景,憧れ,揃う,極める,沼,沼地,研修,勧める,勧誘,構える,申請,相次ぐ,主,揃える,更生,構え,鉄板,富む,人脈,汚れる,不振,園,天然,年次,沢,泥沼,真似,推定,汚す,物真似,紅葉,親父,予測,勇む,払う,推測,算定する,素人,雨天,大幅,整然,歯科,炒める,募る,払い,梅雨,炒る,還元,食う,惑う,活躍,祈念,万人,炒飯,衣装,珍〜,助力,為替,独創,選択肢,光年,嫁ぐ,猿真似,伊達,心願,踊り場,准教授,桑原,軌跡,蛇口,血脈,紫蘇,蘇生,凝る,唐揚げ,遍歴,鼓,湧く,湧水,申し申し

Spreadsheet


Some calculations (from original count on 2024.01.31)
  • Total amount of vocab w/ collocations listed: 1872 items
    • Level 17-60 vocab: 51 items
    • Level 1-16 vocab: 1821 items
  • % of vocab with collocations: 1872/6602 = 28.36%
    • levels 17-60 only: 51/4781 = 1.07% :sweat:

What I’d like to do is make a script to mark anything in this list when scrolling through the /vocabulary pages and possibly during reviews (maybe w/ a star or something). Kinda annoying to have a static list, but I’ll be on the lookout for any new additions.

2 Likes

looks like this thread has become me keeping track of WK updates lol, I have no idea if this kind of thing is helpful to anyone but me but I’ll keep updating it anyway

Anyway I’ll update the list with today’s content update (惑う and 踊り場)

As for the script… I got side tracked so I haven’t actually worked on this since my last post >_<

1 Like

Ok so instead of making a script, I found a slightly different use for this data using Item Inspector. Using my current leech table (WK Apprentice/Guru items, Leech Training: 2), I made a new temporary filter called Collocations 17-60. In Advanced Search, I changed these settings:

image

And put all the vocab I’d found earlier in the Search Terms:
(edit: moved that list to this post instead, where it is continuously updated)

So this now allows me to see which of my leeches have common word combinations! Which is basically what I wanted to use this for anyway. I use the common word combos for studying leeches in levels 1-16, so this will be super helpful for any later level leeches :grin

1 Like

So… it turns out my initial forum search wasn’t complete. I’d only searched for “common word combinations” because it seemed that the term they were using in the update threads, but it turns out some of the threads use the word “collocations” instead. On the one hand, this is great because that means more words have been updated! On the other hand, that means more work for me :pensive: I’ve just been updating this as I find them now, I think I have the mass majority of the words now.

2 things:

  1. Okay now I’m sure that I’ve listed all the vocab with common word combinations (as of today’s content update). Went through the forums for one last sweep, there’s now twice as many than in the initial count last month. If there’s any still missing though lmk!
  2. I made a spreadsheet so it’s way easier to update the list above. I linked it below the Item Inspector Search Terms in case anyone wants to look at some more data!