[Networking Series 3/3] Text Character Sets & Encoding

Various topics with in depth explanation

Moderator: Moderator Group

Forum rules
If you have questions please do not respond to the topics and create a new one on in the Pandoras Box or Widget Designer forums instead. Comments to information given are always welcome, though.

[Networking Series 3/3] Text Character Sets & Encoding

Postby Dennis Kuypers » Wed Jul 25, 2018 10:51 pm

The Idea
We know that computers only work with numbers - so how can we express letters or special characters? We could just map letters to numbers and then use our bytes to save or send them. Theoretically we could all agree that a = 0 and b = 1 etc. This would be called a character set. We map symbols/letters to numeric values. Now if we think not only about letters, but also about special characters - emoji and asian symbols we see that there are actually a lot of symbols that we need to map.

Back in the days one of these character sets gained a lot of popularity: ASCII. It's still widely used - but it is limited to a very small range of characters. The good part is that it fits into one byte. If we want to send ASCII we can use a byte for each symbol - easy!

There is another Character Set called Unicode. Unicode defines what symbol is mapped to what numeric value - but most importantly it does not define how to express them in computer memory or how to send them in network messages. It's just about what symbol maps to what numeric value - the job of a character set.

If we actually want to send Unicode we have to agree on the same system to express the values (that are mapped to the symbols). This ruleset is called a character encoding. There are many character encodings for Unicode with names like UCS-2, UCS-4 and the probably most famous one: UTF-8.

All ASCII characters are also valid UTF-8 characters because Unicode is identical to ASCII for the first 127 symbols.

If you ever get weird characters make sure that your applications use the same encoding. Widget Designer uses mostly UTF-8 for network related features.
Dennis Kuypers
(former Product Developer, Software)
Dennis Kuypers
coolux Germany
 
Posts: 770
Joined: Thu Jul 05, 2012 12:18 pm

Return to Good to know

Who is online

Users browsing this forum: No registered users and 1 guest