Il difficile compito di Unicode

L’ultima versione di Unicode codifica 113 mila caratteri diversi. Codifica i caratteri di quasi qualsiasi sistema di scrittura a noi conosciuto, persino quelli dell’aramaico, dell’alfabeto fenicio o di Lineare A (un linguaggio che non è neppure stato decodificato ad oggi). E codifica le emoji, inserendone di nuove ogni anno.

Ben Frederickson ha dedicato un interessante articolo sulle complessità che si celano dietro il compito di cui Unicode è investita:

Unicode is crazy complicated, but that is because of the crazy ambition it has in representing all of human language, not because of any deficiency in the standard itself. Human language is a complicated messy business, and Unicode has to be equally complicated to represent it. Thankfully we have people writing those long standards on how to display bidirectional strings appropriately, or sort strings, or the security implications of all this – so that the rest of us don’t have to think about it and just use standard library code to handle instead.