Node string encoding is all over the place. Let's try to straighten out how it works.
First, some very basics about string encoding. A string is a series of bytes. A byte is 8 bits, each of which can be 0 or 1, so a byte can have 28 or 256 different values. Encoding is the process of squashing the graphics you see on screen, say, 世 - into actual bytes. There are over a million possible Unicode characters - think about all of the different languages and emoji and symbols on the planet.
You could easily represent all of the characters in the Unicode set with an encoding that says simply "assign one number, 4 bytes (or 32 bits) long, for each character in the Unicode set." One 32-bit combo for each character means you can have 4 billion distinct characters. But this would be really inefficient for most documents. For example, the English Bible or dictionary or people's email folders are mostly the characters a-z, A-Z, 0-9, and punctuation. It would be inefficient to waste 4 bytes on every "a" in the document - we want a way to represent it more efficiently.
... read the whole story at kev.inburke.com.