Hello all devs. This is a request (hopefully to benefit all! :-) )

So, it sounds like what you’re doing is working through a large body of text, and then cutting it down by removing already added words as you go.
I’ve got next to no script knowledge, but I think I can offer you something useful nonetheless.

I’m working under the assumption that you’ve got the extracted dialogue in some sort of text format (thus accessible to word processors, etc.) and that it’s not riddled with “impurities” from the export process.

0. I think you should abandon the idea of deleting text - it sounds like you are doing it because the process you’re contemplating would require it. It’s a dangerous step (see 1.), and for your actual aim (identifying words and building a deck with them) completely unnecessary.

1. Possibly give up on the idea of attempting what you are trying to with words. It is going to be tough as hell to do accurately. For example, doubles deletion could lead to awful errors, for example, you could delete a two-kanji compound that is identical to two kanji of a three-kanji compound… that won’t go well for your aim. The only thing that could make it safe is prior use of a parser, and we have several (MeCab being one of the more common), and they aren’t too easy to use on their own, and all have (different) faults in how they parse.*

2. Going by kanji should be fine, and the aim of deleting should be unproblematic. To assist that:
2A. Also, consider exporting the dialogue to HTML, and use this WK userscript to highlight the kanji you’ve already learned or are set to in WK.
2B. Additionally, use its add kanji feature to one-by-one add the new kanji you come across, and it’ll highlight them differently. (It, or at least in its earlier forms, is an easy script to modify, if that better helps you.) That’ll let you “save” the words from wrongful deletion, and see if the kanji involved were used. Then use Rikaisama to build an Anki deck in Firefox, with sentences and audio. Check out how to set it up for this here.

tl;dr-> Skip to here:
3. Or, if you can put up with the parsing (and why not! a little error is only a little), and wish to automate the kanji end of things as much as possible, simply use cb’s Analyzer, which uses MeCab for words but independently checks kanji. cb makes some great stuff, and this will do exactly what you want as well as could possibly be done, without designing a new parser. For conveniently making a sweet deck once you open the text in Firefox with Rikai-sama (also cb’s, I believe), see link in 2B.

I could be wrong, but I think that’s exactly what you want.