Does UTF-8 encoding support Thai language in Endeca Search?
Hi, We have a requirement to configure Endeca search in Thai language. Does UTF-8 encoding on all the configuration files and for incoming data (Record Adapter) create issues with Thai character set? If yes, then please let us know which type of encoding will fix the issue? I am using the Endeca 6.4.0 for search configuration.
Are you having issues? My understanding is that the only encoding you can use for Thai is UTF-8, so provided you are using UTF-8 and have specified the language identifieras Thai (--lang th) you should be fine. Note the language identifier is needed to ensure 6.4.0 uses the new OLT analyzer which supports the full character set instead of the standard Endeca analyzer (which has issues with certain code points - the Thai tone/diacritic marks). Michael
Hi Michael, Thanks for providing the related information. Considering the drawbacks (unavailability of Wildcard search,Phrase search,Search characters, Diacritic folding) of using the OLT analyzer I had switched to Latin-1(Endeca) language analyzer. I did not find issues so far with UTF-8 encoding but I wanted to know whether Latin-1(Endeca) language analyzer pose any potential constraints on advanced search features. Also if UTF-8 encoding can be used along with Latin-1(Endeca) language analyzer. Please advice which analyzer approach is best suited for Thai language.
Hi You should use the OLT analyzer as the Latin-1 analyzer doesn't support all the character codes used by the Thai language. Some words do not use these unsupported character codes, but many words will (any that contain tone marks), and searches for these will not return the correct results. Also the Thai language doesn't use whitespace segmentation so multi-term searches will not work correctly. Finally for ongoing support your best bet would be to use the officially recommended analyzer for this language, which is the OLT one. In terms of the unavailability of certain features, is there any one of those features that you require? Note diacritic folding is (I believe) scheduled to be added in the next release, however this shouldn't be required in Thai to the best of my knowledge as the diacritics are integral to the language. If you need one or more of the other features then I'd recommend raising an SR as the more customers require this for OLT analyzer languages the greater the likelihood that support will be added for this. Thanks Michael
Thanks Michael. I shall use OLT and would raise SR if necessary.