Accessibility

TechNote (Archived)

ColdFusion MX: Searching for words with non-English characters in non-English Verity Collections fail to return results

Issue


In Macromedia ColdFusion MX and higher, when searching UTF-8 encoded documents for words containing non-English characters, the words are not found in the document.

Verity VDK 2.6.1, which is included in ColdFusion MX, does not support searching UTF-8 encoded documents. Searching on words containing only English characters work, however, searching on words that have non-English characters will not return a result set.

For example, if a document is encoded in UTF-8 and contains the French word "scurit", when using the cfsearch tag to search for "scurit" in the French collection containing the UTF-8 encoded document, it will return 0 results even though the document was indexed correctly. To work as expected, the document must be character encoded as either Western European or ISO-8859-1. This occurs with all languages that Verity supports in ColdFusion MX.

The following chart shows the appropriate character encoding for each supported Verity language in ColdFusion MX, in order for search results to return correctly.

ColdFusion language name Character encoding
BOKMAL ISO8859_1
DANISH ISO8859_1
DUTCH ISO8859_1
ENGLISH Cp1252
FINNISH ISO8859_1
FRENCH ISO8859_1
GERMAN ISO8859_1
ITALIAN ISO8859_1
NORWEG ISO8859_1
NORWEGIAN ISO8859_1
NYNORSK ISO8859_1
PORTUG ISO8859_1
PORTUGUESE ISO8859_1
SPANISH ISO8859_1
SWEDISH ISO8859_1

The following languages require a ColdFusion hot fix, currently in development, to work on Windows platforms. This TechNote will be updated when the hot fix is available.

ColdFusion language name Character encoding
ARABIC Cp1256
CZECH Cp1250
GREEK Cp1253
HEBREW Cp1255
HUNGARIAN Cp1250
JAPANESE SJIS
KOREAN KSC5601
POLISH Cp1250
RUSSIAN Cp1251
SIMPLIFIED_CHINESE GB18030
TRADITIONAL_CHINESE Big5
TURKISH Cp1254

Solution


Confirm that all documents being searched are encoded in the specific language, not as UTF-8. For example, in Internet Explorer, you can check the encoding by selecting File > Save As > Encoding. In order to change the character encoding of any documents that are UTF-8 encoded, use an editor that allows you to modify the character encoding.

Additional Information


AlertThis content requires Flash

To view this content, JavaScript must be enabled, and you need the latest version of the Adobe Flash Player.

Download the free Flash Player now!

Get Adobe Flash Player

Creative Commons License

Search Support


Document Details

ID:tn_18973
Browser:Chrome
Internet Explorer
Netscape
Opera
Safari
Firefox
Database:DB2
Informix
MySQL
Oracle
SQL Server
Sybase
MS Access

Products Affected:

coldfusion