23 research outputs found

    Applying Heuristics to Improve A Genetic Query Optimisation Process in Information Retrieval

    Get PDF
    International audienceThis work presents a genetic approach for query optimisation in information retrieval. The proposed GA is improved y heuristics in order to solve the relevance multimodality problem and adapt the genetic exploration process to the information retrieval task. Experiments with AP documents and queries issued from TREC show the effectiveness of our GA mode

    Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

    Full text link
    Speech recognition in noisy and channel distorted scenarios is often challenging as the current acoustic modeling schemes are not adaptive to the changes in the signal distribution in the presence of noise. In this work, we develop a novel acoustic modeling framework for noise robust speech recognition based on relevance weighting mechanism. The relevance weighting is achieved using a sub-network approach that performs feature selection. A relevance sub-network is applied on the output of first layer of a convolutional network model operating on raw speech signals while a second relevance sub-network is applied on the second convolutional layer output. The relevance weights for the first layer correspond to an acoustic filterbank selection while the relevance weights in the second layer perform modulation filter selection. The model is trained for a speech recognition task on noisy and reverberant speech. The speech recognition experiments on multiple datasets (Aurora-4, CHiME-3, VOiCES) reveal that the incorporation of relevance weighting in the neural network architecture improves the speech recognition word error rates significantly (average relative improvements of 10% over the baseline systems)Comment: arXiv admin note: text overlap with arXiv:2001.0706

    Phrase Pair Rescoring with Term Weighting for Statistical Machine Translation

    Get PDF
    We propose to score phrase translation pairs for statistical machine translation using term weight based models. These models employ tf.idftf.idf to encode the weights of content and non-content words in phrase translation pairs. The translation probability is then modeled by similarity functions defined in a vector space. Two similarity functions are compared. Using these models in a statistical machine translation task shows significant improvements

    QUERY OPTIMISATION USING AN IMPROVED GENETIC ALGORITHM

    Get PDF
    International audienceThis paper presents an approach to intelligent information retrieval based on genetic heuristics. Recent search has shown that applying genetic models for query optimisation improve the retrieval effectiveness. We investigate ways to improve this process by combining genetic heuristics and information retrieval techniques. More precisely, we propose to integrate relevance feedback techniques to perform the genetic operators and the speciation heuristic to solve the relevance multimodality problem. Experiments, with AP documents and queries issued from TREC, showed the effectiveness of our approach. Keywords: Informatio

    Graph-based methods for Significant Concept Selection

    Get PDF
    It is well known in information retrieval area that one important issue is the gap between the query and document vocabularies. Concept-based representation of both the document and the query is one of the most effective approaches that lowers the effect of text mismatch and allows the selection of relevant documents that deal with the shared semantics hidden behind both. However, identifying the best representative concepts from texts is still challenging. In this paper, we propose a graph-based method to select the most significant concepts to be integrated into a conceptual indexing system. More specifically, we build the graph whose nodes represented concepts and weighted edges represent semantic distances. The importance of concepts are computed using centrality algorithms that levrage between structural and contextual importance. We experimentally evaluated our method of concept selection using the standard ImageClef2009 medical data set. Results showed that our approach significantly improves the retrieval effectiveness in comparison to state-of-the-art retrieval models

    On using genetic algorithms for multimodal relevance optimisation in information retrieval

    Get PDF
    International audienceThis paper presents a genetic relevance optimisation process performed in an information retrieval system. The process uses genetic techniques for solving multimodal problems (niching) and query reformulation techniques commonly used in information retrieval. The niching technique allows the process to reach different relevance regions of the document space. Query reformulation techniques represent domain knowledge integrated in the genetic operators structure in order to improve the convergence conditions of the algorithm. Experimental analysis performed using a TREC sub-collection validates our approach

    Mixed Graph of Terms: Beyond the bags of words representation of a text

    Get PDF
    The main purpose of text mining techniques is to identify common patterns through the observation of vectors of features and then to use such patterns to make predictions. Vectors of features are usually made up of weighted words, as well as those used in the text retrieval field, which are obtained thanks to the assumption that considers a document as a "bag of words". However, in this paper we demonstrate that, to obtain more accuracy in the analysis and revelation of common patterns, we could employ (observe) more complex features than simple weighted words. The proposed vector of features considers a hierarchical structure, named a mixed Graph of Terms, composed of a directed and an undirected sub-graph of words, that can be automatically constructed from a small set of documents through the probabilistic Topic Model. The graph has demonstrated its efficiency in a classic "ad-hoc" text retrieval problem. Here we consider expanding the initial query with this new structured vector of features

    Un Algorithme génétique spécifique à une reformulation multi-requêtes dans un système de recherche d'information

    Get PDF
    National audienceCet article présente une approche de reformulation de requête fondée sur l'utilisation combinée de la stratégie d'injection de pertinence et des techniques avancées de l'algorithmique génétique. Nous proposons un processus génétique d'optimisation multi-requêtes amélioré par l'intégration des heuristiques de nichage et adaptation des opérateurs génétiques. L'heuristique de nichage assure une recherche d'information coopérative dans différentes directions de l'espace documentaire. L'intégration de la connaissance à la structure des opérateurs permet d'améliorer les conditions de convergence de l'algorithme. Nous montrons, à l'aide d'expérimentations réalisées sur une collection TREC, l'intérêt de notre approche

    The study of probability model for compound similarity searching

    Get PDF
    Information Retrieval or IR system main task is to retrieve relevant documents according to the users query. One of IR most popular retrieval model is the Vector Space Model. This model assumes relevance based on similarity, which is defined as the distance between query and document in the concept space. All currently existing chemical compound database systems have adapt the vector space model to calculate the similarity of a database entry to a query compound. However, it assumes that fragments represented by the bits are independent of one another, which is not necessarily true. Hence, the possibility of applying another IR model is explored, which is the Probabilistic Model, for chemical compound searching. This model estimates the probabilities of a chemical structure to have the same bioactivity as a target compound. It is envisioned that by ranking chemical structures in decreasing order of their probability of relevance to the query structure, the effectiveness of a molecular similarity searching system can be increased. Both fragment dependencies and independencies assumption are taken into consideration in achieving improvement towards compound similarity searching system. After conducting a series of simulated similarity searching, it is concluded that PM approaches really did perform better than the existing similarity searching. It gave better result in all evaluation criteria to confirm this statement. In terms of which probability model performs better, the BD model shown improvement over the BIR model
    corecore