I am by zero means an expert on this topic, but I have worked with photos quite a bit in the past and have some sparing experience with photo implementation/layout in a technical context like this. Undeniably an interesting concept and I think mnemonic options are a good thing- but I could and do see quite a few issues with this at large.
The two biggest flaws come from the under-the-hood nonsense most (quality) photo implementation requires. Even when not doing drastic and visible changes, usually pictures need individual editing for proper pixel density, cropping, formatting, etc. If not done correctly it can easily lead to issues with display, upscaling, etc. Photos also take up a pretty considerable amount of both hosting space and processing power. The user experience could suffer loading deadtime, the hosts could have to adjust a lot behind the scenes to account for it, etc.
If this was, for example, a singular web article- sure, it’s not a difficult thing to circumvent. In the context of WK, though, all of that process is being multiplied by nearly 7000 to account for the vast vocabulary. I think those issues would compound pretty fast and just on a technical level it would be quite a bit more difficult than it seems, to the point of impracticality.
I also think that the vast majority of the WK userbase (myself included) would be pretty disappointed in AI generated images being used. Any chance to avoid the environmental toll, creative dilution, and job scarcity they bring is a plus- so generating quadruple digits isn’t particularly favorable to most of us here- and I think the alternative of high quality illustration and/or photography at this scale is not practical either.
Besides the technical complications- I do also have to wonder what exactly would be gained from this?
I’m having a hard time seeing the value added to “橋” having a picture of a bridge under its description- at least that just reading the definition couldn’t express already? How would abstract concepts like grammar elements (形容動詞, as an example) get a visual representation- let alone one that feels additive and descriptive enough to help reinforce it? ((Not to mention I doubt WK would be thrilled to supply us with visual representations of its more raunchy, violent, or taboo vocabulary)). Ultimately I feel like that brain power spent memorizing and creating associations is more aptly spent reinforcing the kanji itself, truthfully, rather than making a secondary association to bounce between if the goal is comfortable reading.
Again, I always will see potential and advocate for user options and learning variety as to accommodate as many as possible- but in this instance I do think it is a pretty huge undertaking for an ultimately small tangible reward. In my opinion (as someone who does work and learn fairly visually), I would prefer if we could upload our own photos in the notes tab- as an example- so we could have an easy way to refer to our own handcrafted mnemonics via illustration, calligraphy, or however it manifests if so desired.
Of course my perspective and ideas aren’t one-size-fits-all, but everything considered I don’t see this idea really coming to fruition beyond userscripts like you mentioned, especially since WK seems to be pretty conservative with its feature implementation at large.