Unicode format

Giganews Newsgroups
Subject: Unicode format
Posted by:  simon (zupan.n…@gmail.com)
Date: Wed, 26 Oct 2011

I have one question about unicode format.

UTF-8 is today preffered encoding instead of UTF-16.

UTF-8 in most cases takes less space, it's length for one character is
from 1byte to 4 bytes, while on the other side UTF16 is from 2 bytes
to 4 bytes.
And one other advantage of UTF-8 is:
A text byte stream cannot be losslessly converted to UTF-16, due to
the possible presence of errors in the byte stream encoding.
This causes unexpected and often severe problems attempting to use
existing data in a system that uses UTF-16 as an internal encoding.
Results are security bugs, DoS if bad encoding throws an exception,
and data loss when different byte streams convert to the same UTF-16.

So, for these and other reasons, UTF-8 has become the dominant
character encoding.

I would like to know, why SQL server uses UTF16 LE for nvarchar,
nchar,.. data types?
(I guess this is because fixed length(2 bytes) and it makes easier
work for string manipulations)

And why it is always 2 bytes for character if some characters in UTF16
are stored as 4 bytes?

So, if i have my data in other programs as UTF8, SQL server must
always convert them to UTF16 LE, to store them or work with them. It
is unnecessary use or processor power?

br,
Simon

Replies