Byte by Byte: UTF Encoding Visualizer

Compare UTF-8 and UTF-16 encoding side by side

Encoding Rules Reference

UTF-8 Bit Patterns

U+0000 - U+007F 0xxxxxxx 1 byte (ASCII)
U+0080 - U+07FF 110xxxxx 10xxxxxx 2 bytes
U+0800 - U+FFFF 1110xxxx 10xxxxxx 10xxxxxx 3 bytes
U+10000 - U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 4 bytes

UTF-16 Encoding

U+0000 - U+FFFF Direct code unit 2 bytes (BMP?)
U+10000 - U+10FFFF High + Low surrogate 4 bytes (Surrogate?)

Size Comparison

0
UTF-8 Bytes
0
UTF-16 Bytes
0x
UTF-16 / UTF-8

Character Breakdown

UTF-8

UTF-16

UTF-8 0 bytes
UTF-16 0 bytes

UTF-8 Encoding

1 byte 2 bytes 3 bytes 4 bytes

UTF-16 Encoding

2 bytes (BMP) 4 bytes (Surrogate)