Unicode Converter

Inspect Unicode code points, UTF-8 byte sequences, and character categories for any text.

Text → Unicode Info

Code Point Lookup

About This Tool

The Unicode Converter inspects any text and returns the Unicode code point (U+XXXX), UTF-8 byte sequence, and character category for every character.

UTF-8 is a variable-width encoding: ASCII (U+0000-U+007F) uses 1 byte, U+0080-U+07FF uses 2 bytes, U+0800-U+FFFF (CJK, Arabic, Hangul) uses 3 bytes, and supplementary characters U+10000+ (most emoji) use 4 bytes.

How to Use

Type or paste text into the text analysis field and click Analyze.
The table shows the character, code point, UTF-8 bytes, and category for each character.
To look up a specific character, enter its code point (e.g., U+1F600 or 1F600) in the Code Point Lookup field.
Click Look up to see the character and its encoding details.

Use Cases

Engineers debug character encoding issues in strings with unexpected symbols. Web developers verify special characters are correctly encoded before storage. Security researchers analyze unusual Unicode characters used in homograph attacks.

FAQ

What is a Unicode code point? — A unique number assigned to every character in the Unicode standard, written as U+XXXX in hexadecimal (e.g., U+0041 = 'A').
What is UTF-8? — A variable-width character encoding that represents Unicode code points as 1-4 bytes. It is the dominant encoding on the web.
Why do emoji take 4 bytes in UTF-8? — Emoji are in the Unicode Supplementary Multilingual Plane (U+10000+) which requires 4 bytes in UTF-8 encoding.