916 research outputs found

    Query expansion using medical information extraction for improving information retrieval in French medical domain

    Get PDF
    Many users’ queries contain references to named entities, and this is particularly true in the medical field. Doctors express their information needs using medical entities as they are elements rich with information that helps to better target the relevant documents. At the same time, many resources have been recognized as a large container of medical entities and relationships between them such as clinical reports; which are medical texts written by doctors. In this paper, we present a query expansion method that uses medical entities and their semantic relations in the query context based on an external resource in OWL. The goal of this method is to evaluate the effectiveness of an information retrieval system to support doctors in accessing easily relevant information. Experiments on a collection of real clinical reports show that our approach reveals interesting improvements in precision, recall and MAP in medical information retrieval

    A Survey on Important Aspects of Information Retrieval

    Get PDF
    Information retrieval has become an important field of study and research under computer science due to the explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive survey discussing not only the emergence and evolution of information retrieval but also include different information retrieval models and some important aspects such as document representation, similarity measure and query expansion

    Arabic Query Expansion Using WordNet and Association Rules

    Get PDF
    Query expansion is the process of adding additional relevant terms to the original queries to improve the performance of information retrieval systems. However, previous studies showed that automatic query expansion using WordNet do not lead to an improvement in the performance. One of the main challenges of query expansion is the selection of appropriate terms. In this paper, we review this problem using Arabic WordNet and Association Rules within the context of Arabic Language. The results obtained confirmed that with an appropriate selection method, we are able to exploit Arabic WordNet to improve the retrieval performance. Our empirical results on a sub-corpus from the Xinhua collection showed that our automatic selection method has achieved a significant performance improvement in terms of MAP and recall and a better precision with the first top retrieved documents

    Enhanced word embedding similarity measures using fuzzy rules for query expansion

    Full text link
    © 2017 IEEE. Query expansion has been widely used to select additional words that are related to the original query words in the field of information retrieval. In this paper, we present a novel query expansion method that jointly uses fuzzy rules and a word embedding similarity calculation. The expansion words are generated using a word embedding method and selected according to their semantic similarity to the original query. Fuzzy rules are used to enhance the word similarity calculations and reweight expansion words. When measuring and ranking the relevance of a retrieved document, the original query and the expansion words with their weights are considered. We conduct experiments on the query expansion in document ranking tasks. Experimental results from the document ranking task show that the proposed method is able to significantly outperform state-of-the-art baseline methods

    Terms interrelationship query expansion to improve accuracy of Quran search

    Get PDF
    Quran retrieval system is becoming an instrument for users to search for needed information. The search engine is one of the most popular search engines that successfully implemented for searching relevant verses queries. However, a major challenge to the Quran search engine is word ambiguities, specifically lexical ambiguities. With the advent of query expansion techniques for Quran retrieval systems, the performance of the Quran retrieval system has problem and issue in terms of retrieving users needed information. The results of the current semantic techniques still lack precision values without considering several semantic dictionaries. Therefore, this study proposes a stemmed terms interrelationship query expansion approach to improve Quran search results. More specifically, related terms were collected from different semantic dictionaries and then utilize to get roots of words using a stemming algorithm. To assess the performance of the stemmed terms interrelationship query expansion, experiments were conducted using eight Quran datasets from the Tanzil website. Overall, the results indicate that the stemmed terms interrelationship query expansion is superior to unstemmed terms interrelationship query expansion in Mean Average Precision with Yusuf Ali 68%, Sarawar 67%, Arberry 72%, Malay 65%, Hausa 62%, Urdu 62%, Modern Arabic 60% and Classical Arabic 59%

    A Taxonomy of Information Retrieval Models and Tools

    Get PDF
    Information retrieval is attracting significant attention due to the exponential growth of the amount of information available in digital format. The proliferation of information retrieval objects, including algorithms, methods, technologies, and tools, makes it difficult to assess their capabilities and features and to understand the relationships that exist among them. In addition, the terminology is often confusing and misleading, as different terms are used to denote the same, or similar, tasks. This paper proposes a taxonomy of information retrieval models and tools and provides precise definitions for the key terms. The taxonomy consists of superimposing two views: a vertical taxonomy, that classifies IR models with respect to a set of basic features, and a horizontal taxonomy, which classifies IR systems and services with respect to the tasks they support. The aim is to provide a framework for classifying existing information retrieval models and tools and a solid point to assess future developments in the field

    An Entropy-Based Approach for Preserving Diversity in Evolutionary Topical Search

    Get PDF
    Topic-based information retrieval is the process of matching a topic of interest against the resources that are indexed. An approach for retrieving topicrelevant resources is to generate queries that are able to reflect the topic of interest. Multi-objective Evolutionary Algorithms have demonstrated great potential to deal with the problem of topical query generation. In an evolutionary approach to topic-based information retrieval the topic of interest is used to generate an initial population of queries, which is evolved towards successively better candidate queries. A common problem with such an approach is poor recall due to loss of genetic diversity. This work proposes a novel strategy inspired on the information theoretic notion of entropy to favor population diversity with the aim of attaining good global recall. Preliminary experiments conducted on a large dataset of labeled documents show the effectiveness of the proposed strategy.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    An Entropy-Based Approach for Preserving Diversity in Evolutionary Topical Search

    Get PDF
    Topic-based information retrieval is the process of matching a topic of interest against the resources that are indexed. An approach for retrieving topicrelevant resources is to generate queries that are able to reflect the topic of interest. Multi-objective Evolutionary Algorithms have demonstrated great potential to deal with the problem of topical query generation. In an evolutionary approach to topic-based information retrieval the topic of interest is used to generate an initial population of queries, which is evolved towards successively better candidate queries. A common problem with such an approach is poor recall due to loss of genetic diversity. This work proposes a novel strategy inspired on the information theoretic notion of entropy to favor population diversity with the aim of attaining good global recall. Preliminary experiments conducted on a large dataset of labeled documents show the effectiveness of the proposed strategy.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    A Survey on Intent-based Diversification for Fuzzy Keyword Search

    Get PDF
    Keyword search is an interesting phenomenon, it is the process of finding important and relevant information from various data repositories. Structured and semistructured data can precisely be stored. Fully unstructured documents can annotate and be stored in the form of metadata. For the total web search, half of the web search is for information exploration process. In this paper, the earlier works for semantic meaning of keywords based on their context in the specified documents are thoroughly analyzed. In a tree data representation, the nodes are objects and could hold some intention. These nodes act as anchors for a Smallest Lowest Common Ancestor (SLCA) based pruning process. Based on their features, nodes are clustered. The feature is a distinctive attribute, it is the quality, property or traits of something. Automatic text classification algorithms are the modern way for feature extraction. Summarization and segmentation produce n consecutive grams from various forms of documents. The set of items which describe and summarize one important aspect of a query is known as the facet. Instead of exact string matching a fuzzy mapping based on semantic correlation is the new trend, whereas the correlation is quantified by cosine similarity. Once the outlier is detected, nearest neighbors of the selected points are mapped to the same hash code of the intend nodes with high probability. These methods collectively retrieve the relevant data and prune out the unnecessary data, and at the same time create a hash signature for the nearest neighbor search. This survey emphasizes the need for a framework for fuzzy oriented keyword search

    Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm

    Get PDF
    Document representation (indexing) techniques are dominated by variants of the term-frequency analysis approach, based on the assumption that the more occurrences a term has throughout a document the more important the term is in that document. Inherent drawbacks associated with this approach include: poor index quality, high document representation size and the word mismatch problem. To tackle these drawbacks, a document representation improvement method called the Relevance Feedback Accumulation (RFA) algorithm is presented. The algorithm provides a mechanism to continuously accumulate relevance assessments over time and across users. It also provides a document representation modification function, or document representation learning function that gradually improves the quality of the document representations. To improve document representations, the learning function uses a data mining measure called support for analyzing the accumulated relevance feedback. Evaluation is done by comparing the RFA algorithm to other four algorithms. The four measures used for evaluation are (a) average number of index terms per document; (b) the quality of the document representations assessed by human judges; (c) retrieval effectiveness; and (d) the quality of the document representation learning function. The evaluation results show that (1) the algorithm is able to substantially reduce the document representations size while maintaining retrieval effectiveness parameters; (2) the algorithm provides a smooth and steady document representation learning function; and (3) the algorithm improves the quality of the document representations. The RFA algorithm\u27s approach is consistent with efficiency considerations that hold in real information retrieval systems. The major contribution made by this research is the design and implementation of a novel, simple, efficient, and scalable technique for document representation improvement
    corecore