HTML Charset
HTML, the charset attribute specifies the character encoding for the HTML
document. This is essential for ensuring that the text is displayed
correctly, especially for non-ASCII characters. The character encoding can
be specified using the <meta> tag in the <head> section of the
HTML document. Here is how you can specify the character encoding.
Example:
<!DOCTYPE
html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document Title</title>
</head>
<body>
<!-- Content goes here -->
</body>
</html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document Title</title>
</head>
<body>
<!-- Content goes here -->
</body>
</html>
Explanation:
- <!DOCTYPE html>: This declaration defines the document to be HTML5.
- <html lang="en">: The lang attribute specifies the language of the document.
- <head>: Contains meta-information about the HTML document.
- <meta charset="UTF-8">: The charset attribute inside the <meta> tag specifies the character encoding. UTF-8 is the most commonly used encoding because it can represent almost all characters from all the writing systems in the world.
- <title>Document Title</title>: Sets the title of the document, which is shown in the browser's title bar or tab.
- <body>: Contains the content of the HTML document.
Common Charset Encodings
- UTF-8: Universal character set, supports all characters.
- ISO-8859-1: Western European (Latin-1) character set.
- UTF-16: Unicode Transformation Format, 16-bit encoding.
Differences Between Character Sets
Character sets, or character encodings, are methods for encoding a
repertoire of characters (letters, numbers, symbols, etc.) for use in
computer systems. Different character sets are used to represent text in
various languages and scripts. Here are the key differences between some
of the most common character sets:
1. UTF-8
Encoding: Variable-length (1 to 4 bytes per character).
Coverage: Can represent any character in the Unicode standard,
which includes characters from almost all writing systems.
Usage: The most widely used character set on the web; recommended
for maximum compatibility and support for internationalization.
Advantages: Efficient for ASCII characters (1 byte), backward
compatible with ASCII, and capable of representing all Unicode characters.
Example: <meta charset="UTF-8">
2. ISO-8859-1 (Latin-1)
Encoding: Single-byte (8 bits per character).
Coverage: Western European languages.
Usage: Commonly used in older systems and legacy content.
Advantages: Simple and efficient for Western European text.
Disadvantages: Limited character set; cannot represent characters
from many other languages and scripts.
Example: <meta charset="ISO-8859-1">
3. UTF-16
Encoding: Variable-length (2 or 4 bytes per character).
Coverage: Can represent all Unicode characters.
Usage: Used internally by some operating systems and applications
(e.g., Windows)
Advantages: Efficient for texts with many non-ASCII characters.
Disadvantages: Not as space-efficient as UTF-8 for ASCII text; can
cause issues with byte order (endianness).
Example: <meta charset="UTF-16">
4. US-ASCII
Encoding: Single-byte (7 bits per character).
Coverage: Basic English letters, digits, and control characters.
Usage: Originally used for early computers and communication
systems.
Advantages: Very simple and efficient for basic English text.
Disadvantages: Extremely limited character set; cannot represent
characters from other languages.
Example: <meta charset="US-ASCII">
5. ISO-8859-2 (Latin-2)
Encoding: Single-byte (8 bits per character).
Coverage: Central European languages (e.g., Czech, Hungarian,
Polish).
Usage: Used for specific regional text representation.
Advantages: Efficient for Central European text.
Disadvantages: Limited to a specific set of languages; not suitable
for multilingual content.
Example: <meta charset="ISO-8859-2">
6. Windows-1252
Encoding: Single-byte (8 bits per character).
Coverage: Western European languages.
Usage: Commonly used in Microsoft Windows environments.
Advantages: Similar to ISO-8859-1 but includes additional
characters.
Disadvantages: Limited character set; not suitable for many
non-Western languages.
Example: <meta charset="Windows-1252">
7. Shift_JIS
Encoding: Variable-length (1 or 2 bytes per character).
Coverage: Japanese characters.
Usage: Commonly used in Japan for encoding Japanese text.
Advantages: Efficient for Japanese text.
Disadvantages: Not suitable for non-Japanese text.
Example: <meta charset="Shift_JIS">
8. EUC-JP
Encoding: Variable-length (1 to 3 bytes per character).
Coverage: Japanese characters.
Usage: Another encoding commonly used for Japanese text.
Advantages: Supports a wider range of Japanese characters than
Shift_JIS.
Disadvantages: Not as widely supported outside Japan.
Example: <meta charset="EUC-JP">
9. GB2312
Encoding: Variable-length (1 or 2 bytes per character).
Coverage: Simplified Chinese characters.
Usage: Standard for simplified Chinese text in China.
Advantages: Efficient for simplified Chinese.
Disadvantages: Limited to simplified Chinese characters.
Example: <meta charset="GB2312">
More topic in HTML