459 research outputs found

    A Survey on Important Aspects of Information Retrieval

    Get PDF
    Information retrieval has become an important field of study and research under computer science due to the explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive survey discussing not only the emergence and evolution of information retrieval but also include different information retrieval models and some important aspects such as document representation, similarity measure and query expansion

    A Conceptual Representation of Documents and Queries for Information Retrieval Systems by Using Light Ontologies

    Get PDF
    International audienceThis article presents a vector space model approach to representing documents and queries, based on concepts instead of terms and using WordNet as a light ontology. Such representation reduces information overlap with respect to classic semantic expansion techniques. Experiments carried out on the MuchMore benchmark and on the TREC-7 and TREC-8 Ad-hoc collections demonstrate the effectiveness of the proposed approach

    Arabic Query Expansion Using WordNet and Association Rules

    Get PDF
    Query expansion is the process of adding additional relevant terms to the original queries to improve the performance of information retrieval systems. However, previous studies showed that automatic query expansion using WordNet do not lead to an improvement in the performance. One of the main challenges of query expansion is the selection of appropriate terms. In this paper, we review this problem using Arabic WordNet and Association Rules within the context of Arabic Language. The results obtained confirmed that with an appropriate selection method, we are able to exploit Arabic WordNet to improve the retrieval performance. Our empirical results on a sub-corpus from the Xinhua collection showed that our automatic selection method has achieved a significant performance improvement in terms of MAP and recall and a better precision with the first top retrieved documents

    Applying Wikipedia to Interactive Information Retrieval

    Get PDF
    There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval

    Improving search engines with open Web-based SKOS vocabularies

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe volume of digital information is increasingly larger and even though organiza-tions are making more of this information available, without the proper tools users have great difficulties in retrieving documents about subjects of interest. Good infor-mation retrieval mechanisms are crucial for answering user information needs. Nowadays, search engines are unavoidable - they are an essential feature in docu-ment management systems. However, achieving good relevancy is a difficult problem particularly when dealing with specific technical domains where vocabulary mismatch problems can be prejudicial. Numerous research works found that exploiting the lexi-cal or semantic relations of terms in a collection attenuates this problem. In this dissertation, we aim to improve search results and user experience by inves-tigating the use of potentially connected Web vocabularies in information retrieval en-gines. In the context of open Web-based SKOS vocabularies we propose a query expan-sion framework implemented in a widely used IR system (Lucene/Solr), and evaluated using standard IR evaluation datasets. The components described in this thesis were applied in the development of a new search system that was integrated with a rapid applications development tool in the context of an internship at Quidgest S.A.Fundação para a Ciência e Tecnologia - ImTV research project, in the context of the UTAustin-Portugal collaboration (UTA-Est/MAI/0010/2009); QSearch project (FCT/Quidgest

    Formal Concept Analysis and Information Retrieval – A Survey

    Get PDF
    International audienceOne of the first models to be proposed as a document index for retrieval purposes was a lattice structure, decades before the introduction of Formal Concept Analysis. Nevertheless, the main notions that we consider so familiar within the community (" extension " , " intension " , " closure operators " , " order ") were already an important part of it. In the '90s, as FCA was starting to settle as an epistemic community, lattice-based Information Retrieval (IR) systems smoothly transitioned towards FCA-based IR systems. Currently, FCA theory supports dozens of different retrieval applications, ranging from traditional document indices to file systems, recommendation, multi-media and more recently, semantic linked data. In this paper we present a comprehensive study on how FCA has been used to support IR systems. We try to be as exhaustive as possible by reviewing the last 25 years of research as chronicles of the domain, yet we are also concise in relating works by its theoretical foundations. We think that this survey can help future endeavours of establishing FCA as a valuable alternative for modern IR systems
    • …
    corecore