UTF-8 can be bigger than UTF-32


Dear Sirs:

Regarding your MCTS (Exam 70-536) Book (ISBN: 9780735622777):

On pp. 172-173 (comparing UTF encodings, Chapter 3, Lesson 2) you state:

“Unicode UTF-8 uses 8-bit, 16-bit … and up to 48-bit encoding.”

and that:

“Unicode UTF-32 encoding represents … characters as … 32-bit integers.”

In the Lesson 2 Review Test, question 1 asks:

“Which … encoding types would yield the largest file size?”

According to the Answers section (pp.954, Lesson 2, Question 1) it’s UTF-32.

If UTF-8 can go up to 48-bits, and UTF-32 is always at 32-bits, how can the answer be a straight forward “UTF-32 would yield the largest file size”?

Answer after the jump…

Hi. You’re right, 8-bit could theoretically be larger than UTF-32, and that makes it a weak question. When writing in English (or anything other than Chinese, Japanese, and Korean), though, UTF-8 files are 8-bits/character.

Thanks for pointing that out!

Oh, and be sure to get the second edition of the 70-536 training kit–it’s MUCH better.

Tony

Comments are closed.