Here I attempt to describe how to get the euro character to work with
the most common browsers out there, for web pages, forms, etc. I have
performed a bunch of automated tests and another bunch of manual
(interactive) tests.
Charsets
There is a bunch of character sets out there. The interesting ones are
presented here:
ASCII | Values 0-127 are the same as unicode characters 0-127. Values 128-255 are undefined. |
ISO-8859-1 (aka latin-1 ) | Values 0-127 are the same as unicode characters 0-127. Values 128-159 are undefined. Values 160-255 are the same as unicode characters 160-255 (does not include the euro character). |
ISO-8859-15 (aka latin-9 ) | Values 0-127 are the same as unicode characters 0-127. Values 128-159 are undefined. Values 160-255 are MOSTLY the same as unicode characters 160-255, most especially includes the euro character. |
windows-1252 (aka cp1252 ) | Values 0-127 are the same as unicode characters 0-127. Values 128-159 are custom mappings to various unicode characters, including the euro character. Values 160-255 are the same as unicode characters 160-255. |
Thus the character sets ISO-8859-15 and
windows-1252 include the euro character. There are other
charsets too, but for me who have been using ISO-8859-1 ,
these are the important ones. For information on other charsets
including the euro character, see the links below.
More information on character sets, conversions and the Euro
character is available on the excellent site http://www.eki.ee/letter.
Test results
I tested the charsets above with the most common browsers, to se how
the euro character works with web pages, forms and urls. I found out
that almost all or all versions of the Internet Explorer completely
ignores charset settings. Other browsers support some or all of the
charset settings, partially or completely. The good thing was though
that all of the most common browsers, including mozilla, netscape,
opera (windows & linux versions) and ie (windows version) supports the
windows-1252 charset, when set in the content-type http
header.
Changes needed
As the windows-1252 is a superset of the
iso-8859-1 charset (meaning that
windows-1252 contains all of the characters in
iso-8859-1 , and some more), it is pretty comfortable to
use it for web pages in linux too, you can use iso-8859-1 while
editing the documents.
What needs to be done is to add this line in the beginning of your
document, preferrably in the <head> section:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
(possibly replacing a previous content-type setting).
By these lines, forms submitted will generate the same results with
all browsers, with the exception of some of the older versions of
opera for linux, which sends the euro character as code 129 instead of
128 which is the correct code in the windows-1252 charset.
To represent euro characters it is recommended to use the
€ instead of € .
Especially, even if using the character directly will work with some
browsers, the support is the best possible when using the numerical
format.
For those designing web services, you need to ensure your HTML
encoders (if you have such) produce the correct encoded form for the
euro character. It might also be recommended to encode the other
characters special to windows-1252 (those not present in the
iso-8859-1 character set) and some others as well. Here is the table I
use for characters between 128-191:
Code | HTML format | Displayed |
128 | € | € |
.. | € | € |
130 | ‚ | ‚ |
| ƒ | ƒ |
| „ | „ |
| … | … |
| † | † |
.. | ‡ | ‡ |
| ˆ | ˆ |
| ‰ | ‰ |
| Š | Š |
| ‹ | ‹ |
140 | Œ | Œ |
| - |
| Ž | Ž |
| - |
| - |
.. | ‘ | ‘ |
| ’ | ’ |
| “ | “ |
| ” | ” |
| • | • |
150 | – | – |
| — | — |
| ˜ | ˜ |
| ™ | ™ |
| š | š |
.. | › | › |
| œ | œ |
| - |
| ž | ž |
| Ÿ | Ÿ |
160 | | |
| - |
| - |
| - |
| ¤ | ¤ |
.. | - |
| ¦ | ¦ |
| - |
| ¨ | ¨ |
| - |
170 | - |
| - |
| - |
| - |
| - |
.. | - |
| - |
| - |
| - |
| - |
180 | ´ | ´ |
| - |
| - |
| - |
| ¸ | ¸ |
.. | - |
| - |
| - |
| ¼ | ¼ |
| ½ | ½ |
190 | ¾ | ¾ |
191 | - |
In the table, "-" means no encoding is needed for that code.
Or, as a java string array:
String[] html_arr = {
"€", "€", "‚", "ƒ",
"„", "…", "†", "‡",
"ˆ", "‰", "Š", "‹",
"Œ", null, "Ž", null,
null, "‘", "’", "“",
"”", "•", "–", "—",
"˜", "™", "š", "›",
"œ", null, "ž", "Ÿ",
" ", null, null, null,
"¤", null, "¦", null,
"¨", null, null, null,
null, null, null, null,
null, null, null, null,
"´", null, null, null,
"¸", null, null, null,
"¼", "½", "¾", null,
};
Notes
You don't need any other setup to fully support euro characters
etc. Especially, setting the charset for the content-type affect the
whole document, including forms. So there's no need to set the
accept-charsets parameter in the
<form> s. It will default to the
windows-1252 charset once it is set in the beginning of
the document as described before.
Another way of setting the charset is setting the header
automatically in the server, instead of setting it inside the html
document. In this case, a http header named
"Content-type" should be set to "text/html;
charset=windows-1252" . Note that if the line is both set in the
http headers AND in the html document, the html document setting will
override the http header value. Thus it is important for templated web
pages not to override it with some other value than windows-1252.
|