Remember ASCII? One page and you know all about it! Unicode’s core document is 690 easy-reading pages. And then there are 13 annexes… Issues like Unicode strings, Regular expressions having these, collating and searching/comparing strings are by no means trivial. Unicode supports “diacritics”, all those pesky signs such as the German “umlaut” characters, not to mention our own “ניקוד” and “טעמי המקרא”. When you search for ‘garçon’, is ‘garcon’ a match? This presentation will try to introduce the issues involved and will lead to some knowledge about how to deal with them in Perl.
Back to the Club's homepage