No problem. I also had forgotten that we have already talked about basically the same thing about a year ago and just noticed it now while looking at my older patch notes
Omg, I have no memory of this. Nor of the problems with the Heatmap. lol
On the first introduction, I immediately found a bug.
Everything inside code blocks, like ` or ``` or <pre>
, should not be collapsed to Furigana shorthand. Otherwise, literal <元>[もと]
cannot be typed with the UserScript on.
The rendering still appears to be correct, though.
You call it a bug, I call it intended behavior
I’m aware of this behavior and I personally would also prefer code blocks to not be affected, but I did not manage to inject my code directly into the “cooking process” of Markdown-it as a rule alongside the existing rules, but only as a preprocessing step. So I would have to replicate the existing Markdown-it rules in my code to apply the ruby conversion only at the correct places. Since the “cooking” is performed after every character input, I did not want to add too many additional computations, so I decided that the use case “furigana markup inside code block” comes up too rarely to warrant the additional necessary computations. And since the markup => ruby conversion does not take code blocks into account, I decided that ruby => markup should also ignore code blocks to at least be consistent across these two directions.
What do you think of [spoiler]
'd Furigana in [details]
?
[details=巧み]
Details is dead.
[/details]
It seems that Furigana in [details]
itself doesn’t work either.
<ruby lang = 'ja-JP'>巧<rp>(</rp><rt>たく</rt><rp>)</rp></ruby>み
Ruby is dead.
Though, of course it can be done with vanilla HTML.
I still can’t find a way to shorthand render Furigana without auto-inserting Furigana, or blur mode.
That is a limitation of the native [details]
markup rule, so I don’t think it has anything to do with IME2Furigana? The HTML workaround also works with IME2Furigana:
<details><summary><振>[ふ]り<仮名>[がな]</summary>Hidden text</details>
振り仮名
Hidden textNative markup is also incompatible with [details]
, so IME2Furigana is not even a special case:
[details="Some **bold** text"]
Hidden Text
[/details]
Some **bold** text
Hidden Text
The workaround you have linked for blurred furigana is not possible to achieve with furigana markup, but it does not work that well anyway since it places the furigana below the base text instead of above:
Like this?
That’s right.
Looks for me like
Originally, IME2Furigana only converted IME input to ruby – I added the furigana markup functionality half a year later, but did not want to increase the number of modes you have to go through when clicking the F button, so I kept it at the three existing modes. Maybe I should at some point add a proper settings menu for this script so that it can be more customized. For now, you would have to disable IME2Furigana, write your post, and then reenable it before sending your post.
Hmm? Why is it below? For me, it is above for both Firefox for Linux and Firefox for Android.
Seems like Firefox places it above and Chromium (tested with Google Chrome and Microsoft Edge on Windows 10 and Android) places it below and left-aligned. Using <span class="spoiler">
instead of [spoiler]
<details><summary><ruby>漢字<rt><span class="spoiler">かんじ</span></rt></ruby></summary>Hidden text</details>
seems to work better:
漢字
Hidden textI have just figured it out. OFF mode should still be to parse custom syntax, although shouldn’t auto-insert Furigana on entry.
Change Line 147-148 from
if (event.data.length === 0) return;
furigana = mode ? furigana.replace(/n/g, "ん") : '';
To,
if (event.data.length === 0) return;
furigana = mode ? furigana.replace(/n/g, "ん") : '';
Delete Line 255-258
if (!mode) {
removeBanner();
return raw;
}
Then, this works 世界.
I think there should still be a mode which entirely prevents IME2Furigana from tampering with the user input without disabling the script in Tampermonkey and reloading the page. If I decide to add the possibility to have markup => ruby conversion enabled and IME => markup conversion disabled, it will be in the form of an optional fourth mode and not a modification of one of the existing modes.
And I will probably change the script in the next version to use <span class="spoiler">
instead of [spoiler]
.
Nonetheless, I updated to a version with 4 modes with code-block detection, in case someone need it. (Press “Raw” button to add to TamperMonkey)
Change Logs
- 6 May 2022
- Add label for every Options
- Add Option
CONVERT_ONLY_SPECIAL_MARKUP
- only exactly<ruby lang = 'ja-JP'>
will be converted back to markup. Everything else including<ruby>
will be ignored.
- 1 May 2022
- Add Furigana detection for small Kana (ヵヶ).
- 29 Apr 2022
- Force update textarea for Kiwi Browser for Android (with Violentmonkey).
- 25 Apr 2022
- Change
@name
, so that it can co-exist with the original script. Also, update@description
.
- Change
- 24 Apr 2022
- Add 々 to Kanji list (so that 久々 works)
- Add zero-width space
\u200b
,\u2060
support. Copy and paste the character just behind<
(in<>[]
or<>{}
) to prevent conversion to<ruby>
tags - <振り仮名>[ふりがな]- I made it work to prevent
<ruby>
collapse as well, by putting the weird space just behind<ruby>
- 振り仮名
- I made it work to prevent
- Add a settings to turn off
CODE_BLOCK_DETECTION
, in case it fails.
- 12 Apr 2022
- Don’t autocollapse
<ruby>
tags (to<>[]
/<>{}
) on edit- It still will collapse on Initialization, on Paste, on Lose Focus (onblur), or on Save.
- Don’t collapse, if nested
<ruby>
tags (危険な秘密を) - JoJo's Bizarre Adventure Book Club (Volume 5) 「最後の波紋」 - #13 by polv (Admittedly, this is a temporary fix.)
- Don’t autocollapse
<ruby><ruby>危<rt>き</rt></ruby><ruby>険<rt>けん</rt></ruby><rt>、、</rt></ruby><ruby>な<rt> 、</rt></ruby><ruby><ruby>秘<rt>ひ</rt></ruby><ruby>密<rt>みつ</rt></ruby><rt>、、</rt></ruby><ruby>を<rt> 、</rt></ruby>
- 12 Apr 2012
- Don’t collapse exotic
<ruby>
tags - 你好- Normal
<ruby >
tags can also be prevented from collapsing by adding a single space - 振り仮名
- Normal
- Restyle “blur” mode.
- Don’t collapse exotic
- 6 Apr 2022
- Allow OFF_MODE to be disabled. Also, if OFF_MODE is disabled, the change-mode button won’t be unnecesarily dimmed.
- 5 Apr 2022
- Add code block detection, for both ` and ```.
- Add “manual” mode - doesn’t auto-insert Furigana, but Furigana can still be rendered.
- Add two buttons, for inserting Furigana templates (
<>[]
/<>{}
) - Restyle “auto” mode.
- Default to
<span class="spoiler">
. The old[spoiler]
can still be detected. - Add UserScript icon
IME2振り仮名
This works <おはよう>[Hello]
- おはよう.
This also works - <ruby lang = 'ja-JP'>おはよう<rp>(</rp><rt>Hello</rt><rp>)</rp></ruby>
<ruby lang = 'ja-JP'>おはよう<rp>(</rp><rt>Hello</rt><rp>)</rp></ruby>
I have noticed that you already found a way to inject Markdown-it plugins into Discourse through a userscript! That’s great! I think I will update IME2Furigana to use this method for a better integration with the other markdown rules. At least for furigana markup => ruby, this should automatically prevent any conversion inside of code blocks. I haven’t decided yet how I want to handle ruby => furigana markup.
The only disadvantage of injecting it as a rule instead of a preprocessing step is that markdown does not work inside of HTML tags. But I think this should be okay.
Test if your markdown-it-ruby userscript also works when saving the post: {振|ふ}り{仮名|がな}.
EDIT: I guess hacking the saving function is still necessary after all.
It would be nice if window.markdownit
can be used to preprocess Markdown without saving. Generally, it is easier to navigate HTML than Markdown, I think. Also, it would be nice if Markdown parsing can be ensured, other than having two new-lines after the last tag.
Some things about auto-Furigana inserter (and yeah, I made some fixes above):-
- I am not sure about what you considered to be Furigana and Kanji, but 々 and 久々 don’t work.
Relevant code
// This includes 々〆〇 and several punctuations
const FURIGANA_REGEX = /^[\u3041-\u3096\u3000-\u303f\uff01-\uff5e¥]+$/;
// This does not include 々〆, as well as non-Kanji numbers, including full-width
const KANJI_REGEX = /([\uff66-\uff9d\u4e00-\u9faf\u3400-\u4dbf]+)/;
A quick fix would be
const FURIGANA_REGEX = /^[\p{sc=Katakana}\p{sc=Hiragana}]+$/u;
const KANJI_REGEX = /([\p{sc=Han}\p{N}々〆]+)/u;
But of course, I can’t verify it will cover all should-work cases.
-
Unbalanced Furigana, 穴場. Of course, a quick fix is copy-pasta 穴場穴場 and edit 穴場, though in another markdown-it plugin, it is done by
<穴場>[あな・ば]
or<穴場>[あな/ば]
. (Technically multiple<rt>
in the same<ruby>
.) -
I don’t know if someone mention it yet, but I don’t like
lang = 'ja-JP'
being injected. It’s unnecessary monstrosity (i.e. raw width), as well as<ruby>
isn’t necessarily Japanese (and even if it is, it should be fixed with another span covering, at least in my opinion). (In rare cases, you can see ruby texts with Zhuyin in China.)- With spaces in front, and back, of
=
sign, nonetheless. - Thinking about Wiki editors who didn’t install IME2Furigana (and I didn’t install on my smartphone), it matters.
- With spaces in front, and back, of
I did not know about Unicode categories/scripts in regular expressions. Do you know if it is supported in all major browsers?
My inclusion of punctuations in FURIGANA_REGEX
was a quick workaround in case someone types a complete sentence with punctuation into their IME before converting it (for example if they type げんきです。
and then convert to 元気です。
it should result in <元気>[げんき]です。
)
This was discussed before, but it seemed that many people prefer balanced furigana. If someone really wants to map the reading to the kanji, they can use <穴>[あな]<場>[ば]
.
Since IME2Furigana is not a general Discourse userscript but specifically targeted for the WK Community, I think it is reasonable to assume it will always be used for Japanese text. Ideally, everyone would always enclose their Japanese text with <span lang="ja-JP">...</span>
, but that requires an unreasonable effort, so the next best solution was to let IME2Furigana automatically add it. The most correct way to do this would probably be to find all Japanese characters and surround these sections with spans with lang="ja-JP"
where appropriate. But this is prone to bugs (and arguably even further out of scope for IME2Furigana), so I did not try to add such a feature and instead only add lang="ja-JP"
to ruby tags.
I have spent more time thinking about this, but I could not come up with a way to use this in IME2Furigana. When saving a post, there still needs to be a way to convert only the furigana markup to ruby, and I don’t think it is possible to solve this with the Markdown-it plugin method. So what I said before about preventing IME2Furigana from affecting stuff in code blocks still holds true:
@polv, you already started replicating the code block rules of Markdown-it in your IME2Furigana fork by finding occurrences of ` and ```, but once this can of worms is opened, there is the question of when to stop. For example, should IME2Furigana also handle code blocks that are created by four leading spaces?[1] But just excluding lines starting with four spaces from conversion is also insufficient because in some cases, the four leading spaces have a different meaning and don’t create a code block.[2] This can get even more complicated.[3]
About that, JavaScript built-in: RegExp: Unicode property escapes (`\p{...}`) | Can I use... Support tables for HTML5, CSS3, etc, but I only care about easiness for now. But to convert back to normal RegExp, running XRegExp in Node.js console would show its true form. (The syntax is a little different, though, like \\p{Katakana}
, rather than \\p{sc=Katakana}
.)
I have tried that, actually. But in the end, Firefox on Arch Linux aside (which needs lang="ja-JP"
), a simple script to change <html lang="en">
to <html lang="ja">
already works just fine.
I think that, if uncommon markdown isn’t used yet, it doesn’t matter. However, to take it seriously, is to change Markdown into AST, and turn back (perhaps with remark, rather than markdown-it, as you want to at least turn it back to simple markup <>[]
).
And yeah, since I already used such code block detection for a while, I can survive just fine. (Unless I edit some weird Wiki, which doesn’t happen yet.)
BTW, this is not my fault. Your saved post isn’t as expected. (And I turned off the extension to be sure. What am I supposed to see?)
I have just noticed that <>[]
can be prevented from conversion as well, by adding a single space
- < kanji>[kana]. (Actually, anything in \s, and must be in front.)
With zero-width spaces, it appears that preventing conversion in code blocks and alike is cleared. <kanji>[kana]
- \s doesn’t include zero-width spaces, though it works with everything else that has real space. I need to modify the regex a little.
- Even with my mod to prevent
<ruby>
collapsing to<>[]
, putting ZWSP behind<ruby
is somehow rendered wrongly. Must be a bug with another regex; so I enable putting ZWSP behind<ruby>
(whole tag), for the time being - 振り仮名
These are just examples of markdown code – you can copy them into the editor to see how Markdown-it parses them:
This example shows that context is needed to tell whether a line is a code block or not, and it can get really difficult to get it correct.
I noticed that copying text from a footnote popup in Discourse is not that easy – after selecting the text in the popup and releasing the mouse button, the popup closes. At least it is possible to use Ctrl+C while still holding down the mouse button.
I see, although both your code and my mod do the conversion, anyway.
As I said, disambiguation can be done another way by ZWSP, which can be added to your code with very little mod. (Of course, I don’t want to write a real Markdown text to AST parser, so that code blocks can be accurately detected, not to mention whether it will comply with markdown-it
? While using markdown-it
parser is possible, it won’t convert back to Markdown.)
Text followed by footnote[1]
Note: Ah, it’s in collapse-mode after editing.
-
-
Bullet Point
Same indentation as Bullet Point, 漢字 should convert
Code, <漢字>[かんじ] should not convert
-
There is even an easier way to work around these both:- convert back only <ruby lang = 'ja-JP'>
(and don’t try to convert <ruby>
to <ruby lang = 'ja-JP'>
), so it is as simple as
-
努力 (previously suggested
<努力>[ど/りょく]
)<ruby>努<rt>ど</rt>力<rt>りょく</rt></ruby>
-
各務原 (previously suggested syntax won’t work)
<ruby>各務<rt>かがみ</rt>原<rt>はら</rt></ruby>
Of course, it even works for nested Furigana (because it is native HTML).
- 危険な秘密を
No need for my messy code of nested <ruby>
tag detection.
Since <ruby lang = 'ja-JP'>
itself is already unique as well, there might be no need for code block detection at all, unless in rare cases that I really want to show up what is raw in <ruby lang = 'ja-JP'>
(and when it is then, I can just turn off the UserScript).