Categories

JAVA DATEBASE
Technology Network Community
Oracle Database
Fusion Middleware
Development Tools
Java
Desktop
Server & Storage Systems
Enterprise Management
Berkeley DB Family
Cloud Computing
Big Data
Business Intelligence
Architecture
Migration and Modernization
E-Business Suite
Siebel
PeopleSoft Enterprise
JD Edwards World
Industries
JD Edwards EnterpriseOne
User Productivity Kit Pro (UPK) and Tutor
Governance, Risk & Compliance (GRC)
Master Data Management (MDM)
Oracle CRM On Demand
On Demand: SaaS and Managed Applications
AutoVue Enterprise Visualization
Primavera
ATG
Agile PLM
Endeca Experience Management
Fusion Applications
Archived Forums

 



Tags

Technical Questions


Does UTF-8 encoding support Thai language in Endeca Search?


Hi,   We have a requirement to configure Endeca search in Thai language. Does UTF-8 encoding on all the configuration files and for incoming data (Record Adapter) create issues with Thai character set? If yes, then please let us know which type of encoding will fix the issue? I am using the Endeca 6.4.0 for search configuration.

Are you having issues?  My understanding is that the only encoding you can use for Thai is UTF-8, so provided you are using UTF-8 and have specified the language identifieras Thai  (--lang th) you should be fine.  Note the language identifier is needed to ensure 6.4.0 uses the new OLT analyzer which supports the full character set instead of the standard Endeca analyzer (which has issues with certain code points - the Thai tone/diacritic marks).  Michael

Hi Michael,         Thanks for providing the related information.   Considering the drawbacks (unavailability  of Wildcard search,Phrase search,Search characters, Diacritic folding) of using the OLT analyzer I had switched to Latin-1(Endeca) language analyzer. I did not find issues so far with UTF-8 encoding but I wanted to know whether Latin-1(Endeca) language analyzer pose any potential constraints on advanced search features. Also if UTF-8 encoding can be used along with Latin-1(Endeca) language analyzer. Please advice which analyzer approach is best suited for Thai language.

Hi  You should use the OLT analyzer as the Latin-1 analyzer doesn't support all the character codes used by the Thai language.  Some words do not use these unsupported character codes, but many words will (any that contain tone marks), and searches for these will not return the correct results.  Also the Thai language doesn't use whitespace segmentation so multi-term searches will not work correctly.  Finally for ongoing support your best bet would be to use the officially recommended analyzer for this language, which is the OLT one.    In terms of the unavailability of certain features, is there any one of those features that you require?  Note diacritic folding is (I believe) scheduled to be added in the next release, however this shouldn't be required in Thai to the best of my knowledge as the diacritics are integral to the language.  If you need one or more of the other features then I'd recommend raising an SR as the more customers require this for OLT analyzer languages the greater the likelihood that support will be added for this.    Thanks  Michael

Thanks Michael. I shall use OLT and would raise SR if necessary.


Related Links

endecafieldnames.properties in CRS 10.1.2
Importance of prototype scope for handlers
New to Linux Machine Usage in Endeca
Pages Navigation is not coming in Endeca Experience Manager in Endeca 3.1.1
No screen output when running baseline_update
Need suggestions - implementing endeca search on mobile
Maxfield relevance algorithm ranking dimensions higher than properties
Partial forge shows "missing property"
ATG Endeca Experience manager -dimension ID's from Staging-Production
Number of Endeca Applications to create
Getting duplicate data in Dimensions
ATG and Endeca Assembler Integration Issue
issue in installing endeca mobile api
Getting Additional things from Endeca on home page
Does UTF-8 encoding support Thai language in Endeca Search?
Issue in workbench integration with Dev studio on unix machine