ColdFusion MX: Searching for words with non-English characters in non-English Verity Collections fail to return results
Issue
In Macromedia ColdFusion MX and higher, when searching UTF-8 encoded documents for words containing non-English characters, the words are not found in the document.
Verity VDK 2.6.1, which is included in ColdFusion MX, does not support searching UTF-8 encoded documents. Searching on words containing only English characters work, however, searching on words that have non-English characters will not return a result set.
For example, if a document is encoded in UTF-8 and contains the French word "scurit", when using the cfsearch tag to search for "scurit" in the French collection containing the UTF-8 encoded document, it will return 0 results even though the document was indexed correctly. To work as expected, the document must be character encoded as either Western European or ISO-8859-1. This occurs with all languages that Verity supports in ColdFusion MX.
The following chart shows the appropriate character encoding for each supported Verity language in ColdFusion MX, in order for search results to return correctly.
| ColdFusion language name | Character encoding |
| BOKMAL | ISO8859_1 |
| DANISH | ISO8859_1 |
| DUTCH | ISO8859_1 |
| ENGLISH | Cp1252 |
| FINNISH | ISO8859_1 |
| FRENCH | ISO8859_1 |
| GERMAN | ISO8859_1 |
| ITALIAN | ISO8859_1 |
| NORWEG | ISO8859_1 |
| NORWEGIAN | ISO8859_1 |
| NYNORSK | ISO8859_1 |
| PORTUG | ISO8859_1 |
| PORTUGUESE | ISO8859_1 |
| SPANISH | ISO8859_1 |
| SWEDISH | ISO8859_1 |
The following languages require a ColdFusion hot fix, currently in development, to work on Windows platforms. This TechNote will be updated when the hot fix is available.
| ColdFusion language name | Character encoding |
| ARABIC | Cp1256 |
| CZECH | Cp1250 |
| GREEK | Cp1253 |
| HEBREW | Cp1255 |
| HUNGARIAN | Cp1250 |
| JAPANESE | SJIS |
| KOREAN | KSC5601 |
| POLISH | Cp1250 |
| RUSSIAN | Cp1251 |
| SIMPLIFIED_CHINESE | GB18030 |
| TRADITIONAL_CHINESE | Big5 |
| TURKISH | Cp1254 |
Solution
Confirm that all documents being searched are encoded in the specific language, not as UTF-8. For example, in Internet Explorer, you can check the encoding by selecting File > Save As > Encoding. In order to change the character encoding of any documents that are UTF-8 encoded, use an editor that allows you to modify the character encoding.
Additional Information
- ColdFusion MX: No valid documents error returned when searching Verity collections (18302)
- ColdFusion (versions 4.5 and higher): Supported file types for Verity (18149)
- LiveDocs: Installing Verity Locales
This content requires Flash
To view this content, JavaScript must be enabled, and you need the latest version of the Adobe Flash Player.
Download the free Flash Player now!
