This simplified my code a lot:
my $utf8 = q{
[\x00-\x7F] # One-byte range
| [\xC2-\xDF][\x80-\xBF] # Two-byte range
| \xE0[\xA0-\xBF][\x80-\xBF] # Three-byte range
| [\xE1-\xEF][\x80-\xBF][\x80-\xBF] # Three-byte range
| \xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF] # Four-byte range
| [\xF1-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF] # Four-byte range
| \xF8[\x88-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF] # Five-byte range
| [\xF9-\xFB][\x80-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF] # Five-byte range
| \xFC[\x84-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF] # Six-byte range
| \xFD[\x80-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF] # Six-byte range };
Thanks, O'Reilly (PDF).
0 Comments:
Post a Comment
« Home
Post a Comment