Remember ASCII? One page and you know all about it! Unicode’s core document is 690 easy-reading pages. And then there are 13 annexes… Issues like Unicode strings, Regular expressions having these, collating and searching/comparing strings are by no means trivial. Unicode supports “diacritics”, all those pesky signs such as the German “umlaut” characters, not to mention our own “ניקוד” and “טעמי המקרא”. When you search for ‘garçon’, is ‘garcon’ a match? This presentation will try to introduce the issues involved and will lead to some knowledge about how to deal with them in Perl.
A veteran EE, now retired, fulfilling a decision taken when I was nine years old. I used to develop hardware for Radar and Electronic-Measures signal processing, as well as general usage digital circuitry. I am now developing data collection and trend-analysis algorithms of trading on the stock-exchange. I use Perl and SQL and some times HTML, CSS, JavaScript and DOM.
slides in pdf