716 research outputs found
MIRACLE Retrieval Experiments with East Asian Languages
This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations
LCC-DCU C-C question answering task at NTCIR-5
This paper describes the work for our participation in the NTCIR-5 Chinese to Chinese Question Answering task. Our strategy is based on the “Retrieval plus Extraction” approach. We first retrieve relevant documents, then retrieve short passages from the above documents, and finally extract named entity answers from the most relevant passages. For question type identification, we use simple heuristic rules which can cover most questions. The Lemur toolkit with the OKAPI model is used for document retrieval. Results of our task submission are given and some preliminary conclusions drawn
An analysis of question processing of English and Chinese for the NTCIR 5 cross-language question answering task
An important element in question answering systems is the analysis and interpretation of questions. Using the NTCIR 5 Cross-Language Question Answering (CLQA) question test set we demonstrate that the accuracy of deep question analysis is dependent on the quantity and suitability of the available linguistic resources.
We further demonstrate that applying question analysis tools developed on monolingual training materials to questions translated Chinese-English and English-Chinese using machine translation produces much reduced effectiveness in interpretation of the question. This latter result indicates that question analysis for CLQA should primarily be conducted in the question language prior to translation
CRL at Ntcir2
We have developed systems of two types for NTCIR2. One is an enhenced version
of the system we developed for NTCIR1 and IREX. It submitted retrieval results
for JJ and CC tasks. A variety of parameters were tried with the system. It
used such characteristics of newspapers as locational information in the CC
tasks. The system got good results for both of the tasks. The other system is a
portable system which avoids free parameters as much as possible. The system
submitted retrieval results for JJ, JE, EE, EJ, and CC tasks. The system
automatically determined the number of top documents and the weight of the
original query used in automatic-feedback retrieval. It also determined
relevant terms quite robustly. For EJ and JE tasks, it used document expansion
to augment the initial queries. It achieved good results, except on the CC
tasks.Comment: 11 pages. Computation and Language. This paper describes our results
of information retrieval in the NTCIR2 contes
Language Modeling for Multi-Domain Speech-Driven Text Retrieval
We report experimental results associated with speech-driven text retrieval,
which facilitates retrieving information in multiple domains with spoken
queries. Since users speak contents related to a target collection, we produce
language models used for speech recognition based on the target collection, so
as to improve both the recognition and retrieval accuracy. Experiments using
existing test collections combined with dictated queries showed the
effectiveness of our method
A Survey on Retrieval of Mathematical Knowledge
We present a short survey of the literature on indexing and retrieval of
mathematical knowledge, with pointers to 72 papers and tentative taxonomies of
both retrieval problems and recurring techniques.Comment: CICM 2015, 20 page
ICT-DCU question answering task at NTCIR-6
This paper describes details of our participation in the
NTCIR-6 Chinese-to-Chinese Question Answering task. We
use the “retrieval plus extraction approach” to get answers
for questions. We first split the documents into short passages, and then retrieve potentially relevant passages for a question, and finally extract named entity answers from the most relevant passages. For question type identification, we use simple heuristic rules which cover most questions. The Lemur toolkit was used with the okapi model for document retrieval. Results of our task submission are given and some preliminary conclusions drawn
Which one is better: presentation-based or content-based math search?
Mathematical content is a valuable information source and retrieving this
content has become an important issue. This paper compares two searching
strategies for math expressions: presentation-based and content-based
approaches. Presentation-based search uses state-of-the-art math search system
while content-based search uses semantic enrichment of math expressions to
convert math expressions into their content forms and searching is done using
these content-based expressions. By considering the meaning of math
expressions, the quality of search system is improved over presentation-based
systems
- …