[Userscript] Forum: IME2Furigana

Nonetheless, I updated to a version with 4 modes with code-block detection, in case someone need it. (Press “Raw” button to add to TamperMonkey)

auto mode

blur mode

manual mode

Change Logs

  • 6 May 2022
    • Add label for every Options
    • Add Option CONVERT_ONLY_SPECIAL_MARKUP - only exactly <ruby lang = 'ja-JP'> will be converted back to markup. Everything else including <ruby> will be ignored.
  • 1 May 2022
    • Add Furigana detection for small Kana (ヵヶ).
  • 29 Apr 2022
  • 25 Apr 2022
    • Change @name, so that it can co-exist with the original script. Also, update @description.
  • 24 Apr 2022
    • Add 々 to Kanji list (so that 久々(ひさびさ) works)
    • Add zero-width space \u200b ,\u2060 support. Copy and paste the character just behind < (in <>[] or <>{}) to prevent conversion to <ruby> tags - <​振り仮名>[ふりがな]
      • I made it work to prevent <ruby> collapse as well, by putting the weird space just behind <ruby> - ​振り仮名ふりがな
    • Add a settings to turn off CODE_BLOCK_DETECTION, in case it fails.
  • 12 Apr 2022
<ruby><ruby>危<rt>き</rt></ruby><ruby>険<rt>けん</rt></ruby><rt>、、</rt></ruby><ruby>な<rt> 、</rt></ruby><ruby><ruby>秘<rt>ひ</rt></ruby><ruby>密<rt>みつ</rt></ruby><rt>、、</rt></ruby><ruby>を<rt> 、</rt></ruby>
  • 12 Apr 2012
    • Don’t collapse exotic <ruby> tags - 你好ní hǎo
      • Normal <ruby > tags can also be prevented from collapsing by adding a single space - 振り仮名ふりがな
    • Restyle “blur” mode.
  • 6 Apr 2022
    • Allow OFF_MODE to be disabled. Also, if OFF_MODE is disabled, the change-mode button won’t be unnecesarily dimmed.
  • 5 Apr 2022
    • Add code block detection, for both ` and ```.
    • Add “manual” mode - doesn’t auto-insert Furigana, but Furigana can still be rendered.
    • Add two buttons, for inserting Furigana templates (<>[] / <>{})
    • Restyle “auto” mode.
    • Default to <span class="spoiler">. The old [spoiler] can still be detected.
    • Add UserScript icon
IME(アイ・エム・イー)2()仮名(がな)

This works <おはよう>[Hello] - おはよう(Hello).

This also works - <ruby lang = 'ja-JP'>おはよう<rp>(</rp><rt>Hello</rt><rp>)</rp></ruby>

<ruby lang = 'ja-JP'>おはよう<rp>(</rp><rt>Hello</rt><rp>)</rp></ruby>
1 Like

I have noticed that you already found a way to inject Markdown-it plugins into Discourse through a userscript! That’s great! I think I will update IME2Furigana to use this method for a better integration with the other markdown rules. At least for furigana markup => ruby, this should automatically prevent any conversion inside of code blocks. I haven’t decided yet how I want to handle ruby => furigana markup.

The only disadvantage of injecting it as a rule instead of a preprocessing step is that markdown does not work inside of HTML tags. But I think this should be okay.


Test if your markdown-it-ruby userscript also works when saving the post: {振|}り{仮名|がな}.

EDIT: :frowning: I guess hacking the saving function is still necessary after all.

2 Likes

It would be nice if window.markdownit can be used to preprocess Markdown without saving. Generally, it is easier to navigate HTML than Markdown, I think. Also, it would be nice if Markdown parsing can be ensured, other than having two new-lines after the last tag.

Some things about auto-Furigana inserter (and yeah, I made some fixes above):-

  • I am not sure about what you considered to be Furigana and Kanji, but (のま) and 久々(ひさびさ) don’t work.
Relevant code
// This includes 々〆〇 and several punctuations
const FURIGANA_REGEX = /^[\u3041-\u3096\u3000-\u303f\uff01-\uff5e¥]+$/;

// This does not include 々〆, as well as non-Kanji numbers, including full-width
const KANJI_REGEX = /([\uff66-\uff9d\u4e00-\u9faf\u3400-\u4dbf]+)/;

A quick fix would be

const FURIGANA_REGEX = /^[\p{sc=Katakana}\p{sc=Hiragana}]+$/u;
const KANJI_REGEX = /([\p{sc=Han}\p{N}々〆]+)/u;

But of course, I can’t verify it will cover all should-work cases.

  • Unbalanced Furigana, 穴場(あなば). Of course, a quick fix is copy-pasta 穴場(あなば)穴場(あなば) and edit (あな)(), though in another markdown-it plugin, it is done by <穴場>[あな・ば] or <穴場>[あな/ば]. (Technically multiple <rt> in the same <ruby>.)

  • I don’t know if someone mention it yet, but I don’t like lang = 'ja-JP' being injected. It’s unnecessary monstrosity (i.e. raw width), as well as <ruby> isn’t necessarily Japanese (and even if it is, it should be fixed with another span covering, at least in my opinion). (In rare cases, you can see ruby texts with Zhuyin in China.)

    • With spaces in front, and back, of = sign, nonetheless.
    • Thinking about Wiki editors who didn’t install IME2Furigana (and I didn’t install on my smartphone), it matters.

I did not know about Unicode categories/scripts in regular expressions. Do you know if it is supported in all major browsers?

My inclusion of punctuations in FURIGANA_REGEX was a quick workaround in case someone types a complete sentence with punctuation into their IME before converting it (for example if they type げんきです。 and then convert to 元気です。 it should result in <元気>[げんき]です。)

This was discussed before, but it seemed that many people prefer balanced furigana. If someone really wants to map the reading to the kanji, they can use <穴>[あな]<場>[ば].

Since IME2Furigana is not a general Discourse userscript but specifically targeted for the WK Community, I think it is reasonable to assume it will always be used for Japanese text. Ideally, everyone would always enclose their Japanese text with <span lang="ja-JP">...</span>, but that requires an unreasonable effort, so the next best solution was to let IME2Furigana automatically add it. The most correct way to do this would probably be to find all Japanese characters and surround these sections with spans with lang="ja-JP" where appropriate. But this is prone to bugs (and arguably even further out of scope for IME2Furigana), so I did not try to add such a feature and instead only add lang="ja-JP" to ruby tags.


I have spent more time thinking about this, but I could not come up with a way to use this in IME2Furigana. When saving a post, there still needs to be a way to convert only the furigana markup to ruby, and I don’t think it is possible to solve this with the Markdown-it plugin method. So what I said before about preventing IME2Furigana from affecting stuff in code blocks still holds true:

@polv, you already started replicating the code block rules of Markdown-it in your IME2Furigana fork by finding occurrences of ` and ```, but once this can of worms is opened, there is the question of when to stop. For example, should IME2Furigana also handle code blocks that are created by four leading spaces?[1] But just excluding lines starting with four spaces from conversion is also insufficient because in some cases, the four leading spaces have a different meaning and don’t create a code block.[2] This can get even more complicated.[3]


  1. 
        <漢字>[かんじ]
    

    Should be ignored by IME2Furigana? ↩︎

  2. * Bullet point
    
        <漢字>[かんじ]
    

    Should be converted by IME2Furigana? ↩︎

  3. Text followed by footnote[^footnoteName]
    
    [^footnoteName]:
        * Bullet Point
    
            Same indentation as Bullet Point, <漢字>[かんじ] should convert
    
              Code, <漢字>[かんじ] should not convert
    
    ↩︎
1 Like

About that, Can I use... Support tables for HTML5, CSS3, etc, but I only care about easiness for now. But to convert back to normal RegExp, running XRegExp in Node.js console would show its true form. (The syntax is a little different, though, like \\p{Katakana}, rather than \\p{sc=Katakana}.)

I have tried that, actually. But in the end, Firefox on Arch Linux aside (which needs lang="ja-JP"), a simple script to change <html lang="en"> to <html lang="ja"> already works just fine.

I think that, if uncommon markdown isn’t used yet, it doesn’t matter. However, to take it seriously, is to change Markdown into AST, and turn back (perhaps with remark, rather than markdown-it, as you want to at least turn it back to simple markup <>[]).

And yeah, since I already used such code block detection for a while, I can survive just fine. (Unless I edit some weird Wiki, which doesn’t happen yet.)

BTW, this is not my fault. Your saved post isn’t as expected. (And I turned off the extension to be sure. What am I supposed to see?)

I have just noticed that <>[] can be prevented from conversion as well, by adding a single space

  • < kanji>[kana]. (Actually, anything in \s, and must be in front.)

With zero-width spaces, it appears that preventing conversion in code blocks and alike is cleared. <⁠kanji>[kana]

  • \s doesn’t include zero-width spaces, though it works with everything else that has real space. I need to modify the regex a little.
  • Even with my mod to prevent <ruby> collapsing to <>[], putting ZWSP behind <ruby is somehow rendered wrongly. Must be a bug with another regex; so I enable putting ZWSP behind <ruby> (whole tag), for the time being - ​振り仮名ふりがな

ZWSP bad render

These are just examples of markdown code – you can copy them into the editor to see how Markdown-it parses them:

This example shows that context is needed to tell whether a line is a code block or not, and it can get really difficult to get it correct.

I noticed that copying text from a footnote popup in Discourse is not that easy – after selecting the text in the popup and releasing the mouse button, the popup closes. At least it is possible to use Ctrl+C while still holding down the mouse button.

1 Like

I see, although both your code and my mod do the conversion, anyway.

As I said, disambiguation can be done another way by ZWSP, which can be added to your code with very little mod. (Of course, I don’t want to write a real Markdown text to AST parser, so that code blocks can be accurately detected, not to mention whether it will comply with markdown-it? While using markdown-it parser is possible, it won’t convert back to Markdown.)

Text followed by footnote[1]


Note: Ah, it’s in collapse-mode after editing.

My Editor screenshot


    • Bullet Point

      Same indentation as Bullet Point, 漢字(かんじ) should convert

        Code, <⁠漢字>[かんじ] should not convert
      
    ↩︎

There is even an easier way to work around these both:- convert back only <ruby lang = 'ja-JP'> (and don’t try to convert <ruby> to <ruby lang = 'ja-JP'>), so it is as simple as

  • りょく (previously suggested <努力>[ど/りょく])
    • <ruby>努<rt>ど</rt>力<rt>りょく</rt></ruby>
  • 各務かがみはら (previously suggested syntax won’t work)
    • <ruby>各務<rt>かがみ</rt>原<rt>はら</rt></ruby>

Of course, it even works for nested Furigana (because it is native HTML).

  • けん、、みつ、、

No need for my messy code of nested <ruby> tag detection.


Since <ruby lang = 'ja-JP'> itself is already unique as well, there might be no need for code block detection at all, unless in rare cases that I really want to show up what is raw in <ruby lang = 'ja-JP'> (and when it is then, I can just turn off the UserScript).

Just installed this on Safari 15.5 and using Tampermonkey, and the UI doesn’t show up at all. So it doesn’t seem to be working on Safari unfortunately. :frowning:

1 Like

Have you refreshed the page before trying it? Does the script show up as running in the Tampermonkey popup? Can you check in your browser console if there are any errors?

1 Like

Definitely refreshed, updated my computer last night after installing (to check if the problem was that I didn’t have the latest safari update) and also tried a refresh. It does show in Tampermonkey yes.

And here are the errors:
Screen Shot 2022-06-02 at 11.06.42

Do you have any scripts that work (on the forums or on the main WaniKani site), or are none of them working at all?

1 Like

I have golden burn for the forum working fine, and my long-ish list of scripts for main site all work fine.

Turned off other forum scrips and left only the furigana one, and also opened a reply window and typed some Japanese.

This is what I got:

And that is when I noticed this:

I have a content blocker installed that is stopping it?

  • Which Golden Burn? Isn’t it for WaniKani app?
  • Common part of both scripts would be, injecting into Discourse’s internal API - not sure if that helps for Sinyaven. However, Details script is much simpler.
1 Like

It is just called Wanikani Golden Burn, and works on display badges here (making the 60 circles gold, and also makes the burned thing gold on the main site) from rfindley.

Edit: I’m also looking through my content blockers and seeing if I can do something from my side to let both your scripts work…

Do errors remain after turning broken scripts off? Also, I don’t think Golden Burn do anything to the forum anymore.

1 Like

It’s hard to tell what went wrong without any related error messages. Can you try copying the following script into Tampermonkey and see if the F button appears with this version?

Code
// ==UserScript==
// @name         IME2FuriganaDebug
// @namespace    ime2furiganadebug
// @version      1.8
// @description  Adds furigana markup functionality to Discourse. When inputting kanji with an IME, furigana markup is automatically added.
// @author       Sinyaven
// @license      MIT-0
// @match        https://community.wanikani.com/*
// @grant        none
// ==/UserScript==

(async function() {
	"use strict";

	/* global require, exportFunction */
	/* eslint no-multi-spaces: "off" */

	//////////////
	// settings //
	//////////////

	const ASK_BEFORE_CONVERTING_RUBY_TO_FURIGANA_MARKUP = true;

	//////////////

	const DISCOURSE_REPLY_BOX_ID = "reply-control";
	const DISCOURSE_REPLY_AREA_CLASS = "reply-area";
	const DISCOURSE_BUTTON_BAR_CLASS = "d-editor-button-bar";

	let mode = 1;
	let furigana = "";
	let bMode = null;
	let tText = null;
	let dBanner = null;
	let alreadyInjected = false;

	// ---STORAGE--- //

	mode = parseInt(localStorage.getItem("furiganaMode") || mode);
	addEventListener("storage", e => e.key === "furiganaMode" ? modeValueChangeHandler(parseInt(e.newValue)) : undefined);

	function modeValueChangeHandler(newValue) {
		mode = newValue;
		if (!bMode) return;

		updateButton();
		// trigger _updatePreview() by appending a space, dispatching a change event, and then removing the space
		let textValue = tText.value;
		let selectionStart = tText.selectionStart;
		let selectionEnd = tText.selectionEnd;
		let selectionDirection = tText.selectionDirection;
		tText.value += " ";
		tText.dispatchEvent(new Event("change", {bubbles: true, cancelable: true}));
		tText.value = textValue;
		tText.setSelectionRange(selectionStart, selectionEnd, selectionDirection);
		tText.dispatchEvent(new Event("change", {bubbles: true, cancelable: true}));
	}

	function setModeValue(newValue) {
		modeValueChangeHandler(newValue);
		localStorage.setItem("furiganaMode", mode);
	}

	// ---REPLY BOX AND TEXT AREA DETECTION--- //

	let dObserverTarget = await waitFor(DISCOURSE_REPLY_BOX_ID, 1000, 30); // Greasemonkey seems to inject script before reply box is available, so we might have to wait
	let observer = new MutationObserver(m => m.forEach(handleMutation));
	observer.observe(dObserverTarget, {childList: true, subtree: true});

	addCss();

	// text area might already be open
	setupForTextArea(document.querySelector("textarea.d-editor-input"));
	addButton(document.getElementsByClassName(DISCOURSE_BUTTON_BAR_CLASS)[0]);

	function handleMutation(mutation) {
		let addedNodes = Array.from(mutation.addedNodes);
		let removedNodes = Array.from(mutation.removedNodes);
		// those forEach() are executed at most once
		addedNodes.filter(n => n.tagName === "TEXTAREA").forEach(setupForTextArea);
		addedNodes.filter(n => n.classList && n.classList.contains(DISCOURSE_BUTTON_BAR_CLASS)).forEach(addButton);
		removedNodes.filter(n => n.classList && n.classList.contains(DISCOURSE_REPLY_AREA_CLASS)).forEach(cleanup);
	}

	function setupForTextArea(textArea) {
		if (!textArea) return;
		tText = textArea;
	}

	async function waitFor(elementId, checkInterval = 1000, waitCutoff = Infinity) {
		let result = null;
		while (--waitCutoff > 0 && !(result = document.getElementById(elementId))) await sleep(checkInterval);
		return result;
	}

	function sleep(ms) {
		return new Promise(resolve => setTimeout(resolve, ms));
	}

	// ---MAIN LOGIC--- //

	function addButton(div) {
		if (!div || (bMode && bMode.parentElement === div)) return;
		bMode = document.createElement("button");
		bMode.id = "ime2furigana-button";
		bMode.className = "btn no-text btn-icon ember-view";
		bMode.textContent = "F";
		updateButton();
		bMode.addEventListener("click", cycleMode);
		div.appendChild(bMode);
	}

	function cycleMode() {
		setModeValue(mode > 1 ? 0 : mode + 1);
		if (tText) tText.focus();
	}

	function updateButton() {
		bMode.classList.toggle("active", mode);
		bMode.classList.toggle("blur", mode === 2);
		bMode.title = "IME2Furigana - " + (mode ? (mode === 1 ? "on" : "blur") : "off");
	}

	function cleanup() {
		furigana = "";
		bMode = null;
		tText = null;
		dBanner = null;
	}

	// ---ADD CSS--- //

	function addCss() {
		let style = document.createElement("style");
		style.textContent = `
			#ime2furigana-conversion-banner { transform: translateY(-0.25em); padding: 0.2em 0.6em; border-bottom: 1px solid gray; background-color: var(--tertiary-low, rgba(163, 225, 255, 0.5)); }
			#ime2furigana-conversion-banner > button { background-color: transparent; border: none; }
			#ime2furigana-button.active.markup-found { border-bottom: 4px solid var(--tertiary, blue); padding-bottom: calc(0.5em - 3px); }
			#ime2furigana-button.active { background-color: #00000042; }
			#ime2furigana-button.blur { filter: blur(2px); }`;
		document.head.appendChild(style);
	}
})();

I have removed the functionality and just left in the code that adds the F button to maybe narrow down where the problem occurs.

1 Like

Okay, so I turned off all three scripts I have for the forum. So now I only have the debug one turned on. The F still doesn’t show.

Also checked a couple of things. The errors that show up, seem to show up on the WK main site too despite all my scripts there working as intended.

(Still only see the same error messages as in my first screenshot)

2 Likes