23 research outputs found
Applying Heuristics to Improve A Genetic Query Optimisation Process in Information Retrieval
International audienceThis work presents a genetic approach for query optimisation in information retrieval. The proposed GA is improved y heuristics in order to solve the relevance multimodality problem and adapt the genetic exploration process to the information retrieval task. Experiments with AP documents and queries issued from TREC show the effectiveness of our GA mode
Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations
Speech recognition in noisy and channel distorted scenarios is often
challenging as the current acoustic modeling schemes are not adaptive to the
changes in the signal distribution in the presence of noise. In this work, we
develop a novel acoustic modeling framework for noise robust speech recognition
based on relevance weighting mechanism. The relevance weighting is achieved
using a sub-network approach that performs feature selection. A relevance
sub-network is applied on the output of first layer of a convolutional network
model operating on raw speech signals while a second relevance sub-network is
applied on the second convolutional layer output. The relevance weights for the
first layer correspond to an acoustic filterbank selection while the relevance
weights in the second layer perform modulation filter selection. The model is
trained for a speech recognition task on noisy and reverberant speech. The
speech recognition experiments on multiple datasets (Aurora-4, CHiME-3, VOiCES)
reveal that the incorporation of relevance weighting in the neural network
architecture improves the speech recognition word error rates significantly
(average relative improvements of 10% over the baseline systems)Comment: arXiv admin note: text overlap with arXiv:2001.0706
Phrase Pair Rescoring with Term Weighting for Statistical Machine Translation
We propose to score phrase translation pairs for statistical machine translation using term weight based models. These models employ to encode the weights of content and non-content words in phrase translation pairs. The translation probability is then modeled by similarity functions defined in a vector space. Two similarity functions are compared. Using these models in a statistical machine translation task shows significant improvements
QUERY OPTIMISATION USING AN IMPROVED GENETIC ALGORITHM
International audienceThis paper presents an approach to intelligent information retrieval based on genetic heuristics. Recent search has shown that applying genetic models for query optimisation improve the retrieval effectiveness. We investigate ways to improve this process by combining genetic heuristics and information retrieval techniques. More precisely, we propose to integrate relevance feedback techniques to perform the genetic operators and the speciation heuristic to solve the relevance multimodality problem. Experiments, with AP documents and queries issued from TREC, showed the effectiveness of our approach. Keywords: Informatio
Graph-based methods for Significant Concept Selection
It is well known in information retrieval area that one important issue is the gap between the query and document vocabularies. Concept-based representation of both the document and the query is one of the most effective approaches that lowers the effect of text mismatch and allows the selection of relevant documents that deal with the shared semantics hidden behind both. However, identifying the best representative concepts from texts is still challenging. In this paper, we propose a graph-based method to select the most significant concepts to be integrated into a conceptual indexing system. More specifically, we build the graph whose nodes represented concepts and weighted edges represent semantic distances. The importance of concepts are computed using centrality algorithms that levrage between structural and contextual importance. We experimentally evaluated our method of concept selection using the standard ImageClef2009 medical data set. Results showed that our approach significantly improves the retrieval effectiveness in comparison to state-of-the-art retrieval models
On using genetic algorithms for multimodal relevance optimisation in information retrieval
International audienceThis paper presents a genetic relevance optimisation process performed in an information retrieval system. The process uses genetic techniques for solving multimodal problems (niching) and query reformulation techniques commonly used in information retrieval. The niching technique allows the process to reach different relevance regions of the document space. Query reformulation techniques represent domain knowledge integrated in the genetic operators structure in order to improve the convergence conditions of the algorithm. Experimental analysis performed using a TREC sub-collection validates our approach
Mixed Graph of Terms: Beyond the bags of words representation of a text
The main purpose of text mining techniques is to
identify common patterns through the observation of
vectors of features and then to use such patterns to
make predictions. Vectors of features are usually
made up of weighted words, as well as those used in
the text retrieval field, which are obtained thanks to
the assumption that considers a document as a "bag
of words". However, in this paper we demonstrate
that, to obtain more accuracy in the analysis and
revelation of common patterns, we could employ
(observe) more complex features than simple
weighted words. The proposed vector of features
considers a hierarchical structure, named a mixed
Graph of Terms, composed of a directed and an
undirected sub-graph of words, that can be
automatically constructed from a small set of
documents through the probabilistic Topic Model.
The graph has demonstrated its efficiency in a classic
"ad-hoc" text retrieval problem. Here we consider
expanding the initial query with this new structured
vector of features
Un Algorithme génétique spécifique à une reformulation multi-requêtes dans un système de recherche d'information
National audienceCet article présente une approche de reformulation de requête fondée sur l'utilisation combinée de la stratégie d'injection de pertinence et des techniques avancées de l'algorithmique génétique. Nous proposons un processus génétique d'optimisation multi-requêtes amélioré par l'intégration des heuristiques de nichage et adaptation des opérateurs génétiques. L'heuristique de nichage assure une recherche d'information coopérative dans différentes directions de l'espace documentaire. L'intégration de la connaissance à la structure des opérateurs permet d'améliorer les conditions de convergence de l'algorithme. Nous montrons, à l'aide d'expérimentations réalisées sur une collection TREC, l'intérêt de notre approche
The study of probability model for compound similarity searching
Information Retrieval or IR system main task is to retrieve relevant documents according to the users query. One of IR most popular retrieval model is the Vector Space Model. This model assumes relevance based on similarity, which is defined as the distance between query and document in the concept space. All currently existing chemical compound database systems have adapt the vector space model to calculate the similarity of a database entry to a query compound. However, it assumes that fragments represented by the bits are independent of one another, which is not necessarily true. Hence, the possibility of applying another IR model is explored, which is the Probabilistic Model, for chemical compound searching. This model estimates the probabilities of a chemical structure to have the same bioactivity as a target compound. It is envisioned that by ranking chemical structures in decreasing order of their probability of relevance to the query structure, the effectiveness of a molecular similarity searching system can be increased. Both fragment dependencies and independencies assumption are taken into consideration in achieving improvement towards compound similarity searching system. After conducting a series of simulated similarity searching, it is concluded that PM approaches really did perform better than the existing similarity searching. It gave better result in all evaluation criteria to confirm this statement. In terms of which probability model performs better, the BD model shown improvement over the BIR model