Euro character in HTML HOWTO

Here I attempt to describe how to get the euro character to work with the most common browsers out there, for web pages, forms, etc. I have performed a bunch of automated tests and another bunch of manual (interactive) tests.

Charsets

There is a bunch of character sets out there. The interesting ones are presented here:
ASCIIValues 0-127 are the same as unicode characters 0-127.
Values 128-255 are undefined.
ISO-8859-1 (aka latin-1)Values 0-127 are the same as unicode characters 0-127.
Values 128-159 are undefined.
Values 160-255 are the same as unicode characters 160-255 (does not include the euro character).
ISO-8859-15 (aka latin-9)Values 0-127 are the same as unicode characters 0-127.
Values 128-159 are undefined.
Values 160-255 are MOSTLY the same as unicode characters 160-255, most especially includes the euro character.
windows-1252 (aka cp1252)Values 0-127 are the same as unicode characters 0-127.
Values 128-159 are custom mappings to various unicode characters, including the euro character.
Values 160-255 are the same as unicode characters 160-255.
Thus the character sets ISO-8859-15 and windows-1252 include the euro character. There are other charsets too, but for me who have been using ISO-8859-1, these are the important ones. For information on other charsets including the euro character, see the links below.

More information on character sets, conversions and the Euro character is available on the excellent site http://www.eki.ee/letter.

Test results

I tested the charsets above with the most common browsers, to se how the euro character works with web pages, forms and urls. I found out that almost all or all versions of the Internet Explorer completely ignores charset settings. Other browsers support some or all of the charset settings, partially or completely. The good thing was though that all of the most common browsers, including mozilla, netscape, opera (windows & linux versions) and ie (windows version) supports the windows-1252 charset, when set in the content-type http header.

Changes needed

As the windows-1252 is a superset of the iso-8859-1 charset (meaning that windows-1252 contains all of the characters in iso-8859-1, and some more), it is pretty comfortable to use it for web pages in linux too, you can use iso-8859-1 while editing the documents.

What needs to be done is to add this line in the beginning of your document, preferrably in the <head> section:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

(possibly replacing a previous content-type setting).

By these lines, forms submitted will generate the same results with all browsers, with the exception of some of the older versions of opera for linux, which sends the euro character as code 129 instead of 128 which is the correct code in the windows-1252 charset.

To represent euro characters it is recommended to use the &#8364; instead of &euro;. Especially, even if using the character directly will work with some browsers, the support is the best possible when using the numerical format.

For those designing web services, you need to ensure your HTML encoders (if you have such) produce the correct encoded form for the euro character. It might also be recommended to encode the other characters special to windows-1252 (those not present in the iso-8859-1 character set) and some others as well. Here is the table I use for characters between 128-191:

CodeHTML formatDisplayed
128&#8364;
..&#8364;
130&#8218;
&#402;ƒ
&#8222;
&#8230;
&#8224;
..&#8225;
&#710;ˆ
&#8240;
&#352;Š
&#8249;
140&#338;Œ
-
&#381;Ž
-
-
..&#8216;
&#8217;
&#8220;
&#8221;
&#8226;
150&#8211;
&#8212;
&#732;˜
&#8482;
&#353;š
..&#8250;
&#339;œ
-
&#382;ž
&#376;Ÿ
160&nbsp; 
-
-
-
&#164;¤
..-
&#166;¦
-
&#168;¨
-
170-
-
-
-
-
..-
-
-
-
-
180&#180;´
-
-
-
&#184;¸
..-
-
-
&#188;¼
&#189;½
190&#190;¾
191-

In the table, "-" means no encoding is needed for that code.

Or, as a java string array:

String[] html_arr = {
    "&#8364;", "&#8364;", "&#8218;", "&#402;",
    "&#8222;", "&#8230;", "&#8224;", "&#8225;",
    "&#710;", "&#8240;", "&#352;", "&#8249;",
    "&#338;", null, "&#381;", null,
    null, "&#8216;", "&#8217;", "&#8220;",
    "&#8221;", "&#8226;", "&#8211;", "&#8212;",
    "&#732;", "&#8482;", "&#353;", "&#8250;",
    "&#339;", null, "&#382;", "&#376;",
    "&nbsp;", null, null, null,
    "&#164;", null, "&#166;", null,
    "&#168;", null, null, null,
    null, null, null, null,
    null, null, null, null,
    "&#180;", null, null, null,
    "&#184;", null, null, null,
    "&#188;", "&#189;", "&#190;", null,
};

Notes

You don't need any other setup to fully support euro characters etc. Especially, setting the charset for the content-type affect the whole document, including forms. So there's no need to set the accept-charsets parameter in the <form>s. It will default to the windows-1252 charset once it is set in the beginning of the document as described before.

Another way of setting the charset is setting the header automatically in the server, instead of setting it inside the html document. In this case, a http header named "Content-type" should be set to "text/html; charset=windows-1252". Note that if the line is both set in the http headers AND in the html document, the html document setting will override the http header value. Thus it is important for templated web pages not to override it with some other value than windows-1252.