716 research outputs found

    MIRACLE Retrieval Experiments with East Asian Languages

    Get PDF
    This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations

    LCC-DCU C-C question answering task at NTCIR-5

    Get PDF
    This paper describes the work for our participation in the NTCIR-5 Chinese to Chinese Question Answering task. Our strategy is based on the “Retrieval plus Extraction” approach. We first retrieve relevant documents, then retrieve short passages from the above documents, and finally extract named entity answers from the most relevant passages. For question type identification, we use simple heuristic rules which can cover most questions. The Lemur toolkit with the OKAPI model is used for document retrieval. Results of our task submission are given and some preliminary conclusions drawn

    An analysis of question processing of English and Chinese for the NTCIR 5 cross-language question answering task

    Get PDF
    An important element in question answering systems is the analysis and interpretation of questions. Using the NTCIR 5 Cross-Language Question Answering (CLQA) question test set we demonstrate that the accuracy of deep question analysis is dependent on the quantity and suitability of the available linguistic resources. We further demonstrate that applying question analysis tools developed on monolingual training materials to questions translated Chinese-English and English-Chinese using machine translation produces much reduced effectiveness in interpretation of the question. This latter result indicates that question analysis for CLQA should primarily be conducted in the question language prior to translation

    CRL at Ntcir2

    Full text link
    We have developed systems of two types for NTCIR2. One is an enhenced version of the system we developed for NTCIR1 and IREX. It submitted retrieval results for JJ and CC tasks. A variety of parameters were tried with the system. It used such characteristics of newspapers as locational information in the CC tasks. The system got good results for both of the tasks. The other system is a portable system which avoids free parameters as much as possible. The system submitted retrieval results for JJ, JE, EE, EJ, and CC tasks. The system automatically determined the number of top documents and the weight of the original query used in automatic-feedback retrieval. It also determined relevant terms quite robustly. For EJ and JE tasks, it used document expansion to augment the initial queries. It achieved good results, except on the CC tasks.Comment: 11 pages. Computation and Language. This paper describes our results of information retrieval in the NTCIR2 contes

    Language Modeling for Multi-Domain Speech-Driven Text Retrieval

    Full text link
    We report experimental results associated with speech-driven text retrieval, which facilitates retrieving information in multiple domains with spoken queries. Since users speak contents related to a target collection, we produce language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing test collections combined with dictated queries showed the effectiveness of our method

    ICT-DCU question answering task at NTCIR-6

    Get PDF
    This paper describes details of our participation in the NTCIR-6 Chinese-to-Chinese Question Answering task. We use the “retrieval plus extraction approach” to get answers for questions. We first split the documents into short passages, and then retrieve potentially relevant passages for a question, and finally extract named entity answers from the most relevant passages. For question type identification, we use simple heuristic rules which cover most questions. The Lemur toolkit was used with the okapi model for document retrieval. Results of our task submission are given and some preliminary conclusions drawn

    Which one is better: presentation-based or content-based math search?

    Full text link
    Mathematical content is a valuable information source and retrieving this content has become an important issue. This paper compares two searching strategies for math expressions: presentation-based and content-based approaches. Presentation-based search uses state-of-the-art math search system while content-based search uses semantic enrichment of math expressions to convert math expressions into their content forms and searching is done using these content-based expressions. By considering the meaning of math expressions, the quality of search system is improved over presentation-based systems
    corecore