207 research outputs found

    Etude de la Maximisation de l'Influence dans les RĂ©seaux Sociaux

    No full text
    National audienceInfluence maximization is a NP-hard problem depending on the diffusion of information in social networks. The Greedy hill climbing algorithm have been proved a good approximation if the influence fonction we try to optimize is submodular, which is the case for standard diffusion models.We present a diffusion model not equivalent to standard models for which the influence function is not submodular. Then we propose, using toy graphs and a real social network, a study of different influence maximization algorithms on this model and on the standard model IC: some basic heuristics, the greedy hill climbing method, a generalization of the greedy method and an optimization method for submodular functions. We show that even if the influence function is not submodular, the greedy algorithm obtain good results while being able to scale efficiently

    Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

    Full text link
    Although neural information retrieval has witnessed great improvements, recent works showed that the generalization ability of dense retrieval models on target domains with different distributions is limited, which contrasts with the results obtained with interaction-based models. To address this issue, researchers have resorted to adversarial learning and query generation approaches; both approaches nevertheless resulted in limited improvements. In this paper, we propose to use a self-supervision approach in which pseudo-relevance labels are automatically generated on the target domain. To do so, we first use the standard BM25 model on the target domain to obtain a first ranking of documents, and then use the interaction-based model T53B to re-rank top documents. We further combine this approach with knowledge distillation relying on an interaction-based teacher model trained on the source domain. Our experiments reveal that pseudo-relevance labeling using T53B and the MiniLM teacher performs on average better than other approaches and helps improve the state-of-the-art query generation approach GPL when it is fine-tuned on the pseudo-relevance labeled data.Comment: 16 page

    Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

    Full text link
    Hierarchical classification addresses the problem of classifying items into a hierarchy of classes. An important issue in hierarchical classification is the evaluation of different classification algorithms, which is complicated by the hierarchical relations among the classes. Several evaluation measures have been proposed for hierarchical classification using the hierarchy in different ways. This paper studies the problem of evaluation in hierarchical classification by analyzing and abstracting the key components of the existing performance measures. It also proposes two alternative generic views of hierarchical evaluation and introduces two corresponding novel measures. The proposed measures, along with the state-of-the art ones, are empirically tested on three large datasets from the domain of text classification. The empirical results illustrate the undesirable behavior of existing approaches and how the proposed methods overcome most of these methods across a range of cases.Comment: Submitted to journa

    Un modèle de RI basé sur des critères d'obligation et de certitude

    No full text
    International audienceIl existe un grand nombre de modèles de recherche d'information chacun ayant pour but de répondre au mieux aux attentes des utilisateurs. Le modèle que nous proposons se base sur une formulation précise de la requête reflétant le besoin de l'utilisateur : Chaque terme de la requête est augmenté par deux critères, l'un exprimant l'obligation ou non de l'apparition du terme dans les documents et l'autre exprimant la certitude de l'utilisateur quand au terme utilisé. Des expérimentations nous ont permis de vérifier qu'une telle formulation permet de gagner en précision

    Modèle d'indexation de données peu symboliques dans des documents structurés : L'exemple du graphique dans un corpus de documents techniques

    No full text
    International audienceCet article s'intéresse à l'indexation des données ayant une sémantique pauvre dans des documents structurés. Le but est d'exploiter le contenu des données symboliques avoisinantes afin d'en extraire les fragments adéquats pour compléter l'indexation de la donnée non symbolique. Cette approche a été abordée dans le cadre concret d'une application dans un contexte professionnel : indexer les graphiques des documents techniques en exploitant le texte qui les accompagne. Cette indexation est articulée autour d'un modèle de représentation des graphiques tenant compte de la finalité de leur utilisation et du professionnalisme de leurs usagers, et d'un modèle d'extraction des termes d'indexation à partir du texte du document technique

    Learning Multiple Temporal Matching for Time Series Classification

    No full text
    12International audienceIn real applications, time series are generally of complex structure, exhibiting different global behaviors within classes. To discriminate such challenging time series, we propose a multiple temporal matching approach that reveals the commonly shared features within classes, and the most differential ones across classes. For this, we rely on a new framework based on the variance/covariance criterion to strengthen or weaken matched observations according to the induced variability within and between classes. The experiments performed on real and synthetic datasets demonstrate the ability of the multiple temporal matching approach to capture fine-grained distinctions between time series

    Uncertain Trees: Dealing with Uncertain Inputs in Regression Trees

    Full text link
    Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees, have been successfully used for regression in many applications and research studies. Furthermore, these methods have been extended in order to deal with uncertainty in the output variable, using for example a quantile loss in Random Forests (Meinshausen, 2006). To the best of our knowledge, no extension has been provided yet for dealing with uncertainties in the input variables, even though such uncertainties are common in practical situations. We propose here such an extension by showing how standard regression trees optimizing a quadratic loss can be adapted and learned while taking into account the uncertainties in the inputs. By doing so, one no longer assumes that an observation lies into a single region of the regression tree, but rather that it belongs to each region with a certain probability. Experiments conducted on several data sets illustrate the good behavior of the proposed extension.Comment: 9 page
    • …
    corecore