How the character-analysis works
- Grab every chunk of text (text node)
- Remove all the characters we know to be okay
- Examine what's left:
- For each such character, report it
- Look it up in a table of Unicode to get its title
- Report an XPath to the text node while we're at it
- Stylesheet is XSLT 2.0 (for its Unicode functions)
- Extra input is a table of Unicode codepoints with their
names
(see
examples/Unicode-codepoint-lookup.xml)