2,653 research outputs found

    A survey on the use of relevance feedback for information access systems

    Get PDF
    Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems

    The study of probability model for compound similarity searching

    Get PDF
    Information Retrieval or IR system main task is to retrieve relevant documents according to the users query. One of IR most popular retrieval model is the Vector Space Model. This model assumes relevance based on similarity, which is defined as the distance between query and document in the concept space. All currently existing chemical compound database systems have adapt the vector space model to calculate the similarity of a database entry to a query compound. However, it assumes that fragments represented by the bits are independent of one another, which is not necessarily true. Hence, the possibility of applying another IR model is explored, which is the Probabilistic Model, for chemical compound searching. This model estimates the probabilities of a chemical structure to have the same bioactivity as a target compound. It is envisioned that by ranking chemical structures in decreasing order of their probability of relevance to the query structure, the effectiveness of a molecular similarity searching system can be increased. Both fragment dependencies and independencies assumption are taken into consideration in achieving improvement towards compound similarity searching system. After conducting a series of simulated similarity searching, it is concluded that PM approaches really did perform better than the existing similarity searching. It gave better result in all evaluation criteria to confirm this statement. In terms of which probability model performs better, the BD model shown improvement over the BIR model

    A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections

    Get PDF
    This paper introduces a new weighting scheme in information retrieval. It also proposes using the document centroid as a threshold for normalizing documents in a document collection. Document centroid normalization helps to achieve more effective information retrieval as it enables good discrimination between documents. In the context of a machine learning application, namely unsupervised document indexing and retrieval, we compared the effectiveness of the proposed weighting scheme to the 'Term Frequency - Inverse Document Frequency' or TF-IDF, which is commonly used and considered as one of the best existing weighting schemes. The paper shows how the document centroid is used to remove less significant weights from documents and how this helps to achieve better retrieval effectiveness. Most of the existing weighting schemes in information retrieval research assume that the whole document collection is static. The results presented in this paper show that the proposed weighting scheme can produce higher retrieval effectiveness compared with the TF-IDF weighting scheme, in both static and dynamic document collections. The results also show the variation in information retrieval effectiveness that is achieved for static and dynamic document collections by using a specific weighting scheme. This type of comparison has not been presented in the literature before

    Evolving weighting schemes for the Bag of Visual Words

    Get PDF
    The Bag of Visual Words (BoVW) is an established representation in computer vision. Taking inspiration from text mining, this representation has proved to be very effective in many domains. However, in most cases, standard term-weighting schemes are adopted (e.g., term-frequency or TF-IDF). It remains open the question of whether alternative weighting schemes could boost the performance of methods based on BoVW. More importantly, it is unknown whether it is possible to automatically learn and determine effective weighting schemes from scratch. This paper brings some light into both of these unknowns. On the one hand, we report an evaluation of the most common weighting schemes used in text mining, but rarely used in computer vision tasks. Besides, we propose an evolutionary algorithm capable of automatically learning weighting schemes for computer vision problems. We report empirical results of an extensive study in several computer vision problems. Results show the usefulness of the proposed method

    Evaluation Measures for Text Summarization

    Get PDF
    We explain the ideas of automatic text summarization approaches and the taxonomy of summary evaluation methods. Moreover, we propose a new evaluation measure for assessing the quality of a summary. The core of the measure is covered by Latent Semantic Analysis (LSA) which can capture the main topics of a document. The summarization systems are ranked according to the similarity of the main topics of their summaries and their reference documents. Results show a high correlation between human rankings and the LSA-based evaluation measure. The measure is designed to compare a summary with its full text. It can compare a summary with a human written abstract as well; however, in this case using a standard ROUGE measure gives more precise results. Nevertheless, if abstracts are not available for a given corpus, using the LSA-based measure is an appropriate choice

    Text Mining Infrastructure in R

    Get PDF
    During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels.

    A review on the application of evolutionary computation to information retrieval

    Get PDF
    In this contribution, different proposals found in the specialized literature for the application of evolutionary computation to the field of information retrieval will be reviewed. To do so, different kinds of IR problems that have been solved by evolutionary algorithms are analyzed. Some of the specific existing approaches will be specifically described for some of these problems and the obtained results will be critically evaluated in order to give a clear view of the topic to the reader.CICYT under project TIC2002-03276University of Granada under project ‘‘Mejora de Metaheur ısticas mediante Hibridaci on y sus Aplicaciones

    A novel approach integrating ranking functions discovery, optimization and infernce to improve retrieval performance

    Get PDF
    The significant roles play by ranking function in the performance and success of Information Retrieval (IR) systems and search engines cannot be underestimated. Diverse ranking functions are available in IR literature. However, empirical studies show that ranking functions do not perform constantly well across different contexts (queries, collections, users). In this study, a novel three-stage integrated ranking framework is proposed for implementing discovering, optimizing and inference rankings used in IR systems. The first phase, discovery process is based on Genetic Programming (GP) approach which smartly combines structural and contents features in the documents while the second phase, optimization process is based on Genetic Algorithm (GA) which combines document retrieval scores of various well-known ranking functions. In the 3rd phase, Fuzzy inference proves as soft search constraints to be applied on documents. We demonstrate how these two features are combined to bring new tasks and processes within the three concept stages of integrated framework for effective IR
    • 

    corecore