20 research outputs found

    A Graph-based approach to derive the geodesic distance on Statistical manifolds: Application to Multimedia Information Retrieval

    Full text link
    In this paper, we leverage the properties of non-Euclidean Geometry to define the Geodesic distance (GD) on the space of statistical manifolds. The Geodesic distance is a real and intuitive similarity measure that is a good alternative to the purely statistical and extensively used Kullback-Leibler divergence (KLD). Despite the effectiveness of the GD, a closed-form does not exist for many manifolds, since the geodesic equations are hard to solve. This explains that the major studies have been content to use numerical approximations. Nevertheless, most of those do not take account of the manifold properties, which leads to a loss of information and thus to low performances. We propose an approximation of the Geodesic distance through a graph-based method. This latter permits to well represent the structure of the statistical manifold, and respects its geometrical properties. Our main aim is to compare the graph-based approximation to the state of the art approximations. Thus, the proposed approach is evaluated for two statistical manifolds, namely the Weibull manifold and the Gamma manifold, considering the Content-Based Texture Retrieval application on different databases

    No todas las preguntas son (igualmente) difíciles, una aproximación híbrida a la CQA en árabe

    Get PDF
    In the past we faced the problem of Community Question Answering using an unified approach. Some of the questions, however, are easier to be approached by a conventional rule-based system. In this paper we explore this direction.En el pasado hemos abordado la búsqueda de respuestas en comunidades usando un enfoque uniforme. Sin embargo, algunas preguntas pueden ser respondidas utilizando métodos basados en reglas. En este trabajo exploramos esta dirección.Dr. Rodríguez has been partially funded by Spanish project "GraphMed" (TIN2016-77820-C3-3R)

    UPC-USMBA at SemEval-2017 Task 3: Combining multiple approaches for CQA for Arabic

    Get PDF
    This paper presents a description of the participation of the UPC-USMBA team in the SemEval 2017 Task 3, subtask D, Arabic. Our approach for facing the task is based on a performance of a set of atomic classifiers (lexical string-based, vectorial, and rule-based) whose results are later combined. Our primary submission has obtained good results: 2nd (from 3 participants) in MAP, and 1st in inaccuracy.Peer ReviewedPostprint (published version

    Kernel Based Approach for High Dimensional Heterogeneous Image Features Management in CBIR Context

    No full text
    International audienceIn this paper we address a challenge of the problem of the dimensionality curse and the semantic gap reduction for content based image retrieval in large and heterogeneous databases. The strength of our idea resides in building an effective multidimensional indexing method based on kernel principal component analysis (KPCA) which supports efficiently similarity search of the heterogeneous vectors (color, texture, shape) and maps data vectors on a low feature space that is partitioned into regions. An efficient approach to approximate feature space regions is proposed with the corresponding upper and lower distance bounds. Finally, relevance feedback mechanism is exploited to create a flexible retrieval metric in order to reduce the semantic gap between the user need and the data representation. Experimental evaluations show that the use of region approximation approach with relevance feedback can significantly improve both the quality and the CPU time of the result

    Improving Arabic information retrieval using word embedding similarities

    No full text
    International audienceTerm mismatch is a common limitation of traditional information retrieval (IR) models where relevance scores are estimated based on exact matching of documents and queries. Typically, good IR model should consider distinct but semantically similar words in the matching process. In this paper, we propose a method to incorporate word embedding (WE) semantic similarities into existing probabilistic IR models for Arabic in order to deal with term mismatch. Experiments are performed on the standard Arabic TREC collection using three neural word embedding models. The results show that extending the existing IR models improves significantly baseline bag-of-words models. Although the proposed extensions significantly outperform their baseline bag-of-words, the difference between the evaluated neural word embedding models is not statistically significant. Moreover, the overall comparison results show that our extensions significantly improve the Arabic WordNet based semantic indexing approach and three recent WE-based IR language models

    Kernel Region Approximation Blocks For Indexing Heterogonous Databases

    No full text
    International audienceThis paper presents a new indexing method for visual features in high dimensional vector space using region approximation approach. The proposed method is designed to combine the values of the heterogonous features in the same index structure; it determinesnonlinear relationship between features so that more accurate similarity comparison between vectors can be supported. The basic idea is to map the data vectors into a feature space via a nonlinear kernel; the feature space is partitioned into regions. An efficient approach to approximate regions is proposed with the corresponding upper and lower distance bounds.To evaluate our technique, we conducted several experiments for searching the nearest K neighbours. The obtained results show the interest of our metho

    Kernel Based Approach for High Dimensional Heterogeneous Image Features Management in CBIR Context

    No full text
    International audienceIn this paper we address a challenge of the problem of the dimensionality curse and the semantic gap reduction for content based image retrieval in large and heterogeneous databases. The strength of our idea resides in building an effective multidimensional indexing method based on kernel principal component analysis (KPCA) which supports efficiently similarity search of the heterogeneous vectors (color, texture, shape) and maps data vectors on a low feature space that is partitioned into regions. An efficient approach to approximate feature space regions is proposed with the corresponding upper and lower distance bounds. Finally, relevance feedback mechanism is exploited to create a flexible retrieval metric in order to reduce the semantic gap between the user need and the data representation. Experimental evaluations show that the use of region approximation approach with relevance feedback can significantly improve both the quality and the CPU time of the result

    Kernel Region Approximation Blocks For Indexing Heterogonous Databases

    No full text
    International audienceThis paper presents a new indexing method for visual features in high dimensional vector space using region approximation approach. The proposed method is designed to combine the values of the heterogonous features in the same index structure; it determinesnonlinear relationship between features so that more accurate similarity comparison between vectors can be supported. The basic idea is to map the data vectors into a feature space via a nonlinear kernel; the feature space is partitioned into regions. An efficient approach to approximate regions is proposed with the corresponding upper and lower distance bounds.To evaluate our technique, we conducted several experiments for searching the nearest K neighbours. The obtained results show the interest of our metho

    A Supervised Method for Extractive Single Document Summarization based on Sentence Embeddings and Neural Networks

    No full text
    International audienceExtractive summarization consists of generating a summary by ranking sentences from the original texts according to their importance and salience. Text representation is a fundamental process that affects the effectiveness of many text summarization methods. Distributed word vector representations have been shown to improve Natural Language Processing (NLP) tasks, especially Automatic Text Summariza-tion (ATS). However, most of them do not consider the order and the context of the words in a sentence. This does not fully allow grasping the sentence semantics and the syntactic relationships between sentences constituents. In this paper, to overcome this problem, we propose a deep neural network model based-method for extractive single document sum-marization using the state-of-the-art sentence embedding models. Experiments are performed on the standard DUC2002 dataset using three sentence embedding models. The obtained results show the effectiveness of the used sentence embedding models for ATS. The overall comparison results show that our method outperforms eight well-known ATS baselines and achieves comparable results to the state-of-the-art deep learning based methods
    corecore