Get all known vocab

autumnalbees · March 29, 2024, 12:49pm

hi! I am trying to write a script to get all known vocab (whether thats kana, kanji + kana, or just the kanji itself)

I think the way you’re meant to do this is:

Get all subjects
Which gives you subject_id
WaniKani API Reference
then:
Get assignment with subject IDs which returns characters which is what i want
WaniKani API Reference

but my queries are not returnign vocab / words, just the kanji

    url = "https://api.wanikani.com/v2/assignments"

    querystring = {
        "srs_stages": "[1, 2, 3,4,5,6,7,8,9]",
        "subject_types": ["vocabulary", "kana_vocabulary", "kanji"]
    }

    url = "https://api.wanikani.com/v2/subjects"
...
 characters_list = [item['data']['characters'] for item in data['data']]

Anyone know whats wrong or if a script for this already exists?

northpilot · March 29, 2024, 2:06pm

# Written by northpilot, 2024-03-29
import json
import requests


def fetch(request_url, headers, subject_id_pool=None):
    results = []
    def continue_fetch(request_url, headers):
        json = requests.get(url=request_url, headers=headers).json()
        for item in json["data"]:
            if subject_id_pool:  # Filter applied to subjects, not assignments
                if item["id"] not in subject_id_pool:
                    continue  # Skip subjects not present in assignments
            results.append(item)
        if json["pages"]["next_url"]:  # Handle pagination of results
            continue_fetch(json["pages"]["next_url"], headers)
    continue_fetch(request_url, headers)
    return results


if __name__ == "__main__":
    headers = {
        "Wanikani-Revision": "20170710",
        "Authorization": "Bearer <your read-only API token>"
    }

    # Fetch all assignments based on user-specified types and SRS stages
    narrower = ["subject_types=kana_vocabulary,vocabulary,kanji",
                f"srs_stages={','.join(str(n) for n in range(1, 10))}"]
    narrower_string = f"?{'&'.join(narrower)}"
    assignment_url = f"https://api.wanikani.com/v2/assignments{narrower_string}"
    assignments = fetch(assignment_url, headers)

    # Fetch all subjects matching the above assignments
    subject_ids = [assignment["data"]["subject_id"]
                   for assignment in assignments]
    subject_url = f"https://api.wanikani.com/v2/subjects?{narrower[0]}"
    subjects = fetch(subject_url, headers, subject_id_pool=subject_ids)

    characters = set([subject["data"]["characters"] for subject in subjects])
    for character in characters:
        print(character)

一
二
九
七
人
入
八
力
⋮
大人しい
出かける
⋮
さようなら
こんばんは
それ
コンビニ
デパート
⋮

This includes

Edit: I replaced range(10) with range(1, 10) to prevent items with SRS level 0 from showing up. Previously, this showed items that were unlocked but never studied. I also made the structure a bit simpler. I have also just converted the final group of characters to a set before printing to prevent duplicates. This printed, for example, both 一 (kanji) and 一 (vocabulary), but now it just prints 一 once.

autumnalbees · March 29, 2024, 4:08pm

Thank you <3

Topic		Replies	Views
Getting a list of all wanikani kanji - known and unknown API And Third-Party Apps	20	3093	June 1, 2021
Question on retrieving kanji through API API And Third-Party Apps	2	631	May 7, 2022
Advice Needed: How to Export All Unlocked Kanji/Vocab? WaniKani	2	217	February 4, 2026
Getting quizzed on active kanji from my queue using a custom GPT API And Third-Party Apps	4	368	November 27, 2023
Is it possible to get a plaintext document of the vocab? WaniKani	15	927	June 29, 2023

Get all known vocab

Related topics