Billingual formal concept analysis for cross-language information retrieval

Abstract

We propose and evaluate a Cross-language Information Retrieval model (CLIR) based on the extraction and the translation of Formal Concepts avoiding queries and/or documents translation. The contribution of this work is the unified formal framework that integrates Formal Concept Analysis (FCA) and information retrieval for effective CLIR. The model is indexing bilingual documents using bilingual Formal Concepts extracted by a FCA. Moreover, the use of noun phrases, in addition to keywords, as indexes is studied. We use two comparable collections: an Italian-French collection and an English-French collection. To evaluate our model, we use three Information Retrieval models: TF. IDF, BM25 and Language Model. Finally, we study the query expansion results. Our main finding suggests that Formal Concept Analysis is effective to align Formal Concepts from different languages. Results indicate that our model performances are comparable to a words translation approach and better than a words embedding approach.SCOPUS: cp.pinfo:eu-repo/semantics/publishe

    Similar works

    Full text

    thumbnail-image

    Available Versions