Vietnamese-English Cross-Language Information Retrieval (CLIR) Using Bilingual Dictionary
Doan Nguyen
Search/Knowledge Retrieval Group
Abstract
Abstract— Web content is growing each day explosively. A
case study performed in 2001 [9] suggested that 70 percent of
internet content is in English, but only about 44 percent of
Internet users are native English speakers. These numbers
are expected to change but English language is still expected
to play a dominate role. To gain access to English digital
documents from a search query written in Vietnamese
language, we propose a Cross Language Information
Retrieval (CLIR) technique which takes a query and translate
it into phrases, as query text, to retrieve relevance English
search result. The technique employs web query logs to arrive
at statistical information regarding patterns of words usage.
The information then helps to eliminate translation
ambiguities by selecting a proper word sense (meaning) for
translation. The proposed work also is concerning about the
structure of translated query, posing words in a certain order,
to obtain relevance search result. The approach uses a web
bilingual dictionary with lookup terms derived from common
web search queries.
|
|