96 research outputs found
Complex Embeddings for Simple Link Prediction
In statistical relational learning, the link prediction problem is key to
automatically understand the structure of large knowledge bases. As in previous
studies, we propose to solve this problem through latent factorization.
However, here we make use of complex valued embeddings. The composition of
complex embeddings can handle a large variety of binary relations, among them
symmetric and antisymmetric relations. Compared to state-of-the-art models such
as Neural Tensor Network and Holographic Embeddings, our approach based on
complex embeddings is arguably simpler, as it only uses the Hermitian dot
product, the complex counterpart of the standard dot product between real
vectors. Our approach is scalable to large datasets as it remains linear in
both space and time, while consistently outperforming alternative approaches on
standard link prediction benchmarks.Comment: 10+2 pages, accepted at ICML 201
Learning temporal matchings for time series discrimination
In real applications it is not rare for time series of the same class to exhibit dis- similarities in their overall behaviors, or that time series from different classes have slightly similar shapes. To discriminate between such challenging time se- ries, we present a new approach for training discriminative matching that con- nects time series with respect to the commonly shared features within classes, and the greatest differential across classes. For this, we rely on a variance/covariance criterion to strengthen or weaken matched observations according to the induced variability within and between classes. In this paper, learned discriminative matching is used to define a locally weighted time series metric, which restricts time series comparison to discriminative features. The relevance of the proposed approach is studied through a nearest neighbor time series classification on real datasets. The experiments performed demonstrate the ability of learned match- ing to capture fine-grained distinctions between time series, and outperform the standard approaches, all the more so that time series behaviors within the same class are complex
An Evaluation of Inter-Annotator Agreement in the Observation of Anaphoric and Referential Relations
International audienceWhen proposing a description of the data he observes, the linguist must make sure that his observations may be also regularly made by other persons. In this paper, we introduce a typology of anaphoric and referential relations and an experiment which aims at assessing that this typology is operational. Given three newspaper articles, five students were asked to identify anaphoric and/or referential relations between expressions and referents. This inter-subjectivity test confirms results already obtained: coreference is an operational notion, but the perspicuity of other relations is not obvious
Apprentissage de co-similarités pour la classification automatique de données monovues et multivues
L'apprentissage automatique consiste à concevoir des programmes informatiques capables d'apprendre à partir de leurs environnement, ou bien à partir de données. Il existe différents types d'apprentissage, selon que l'on cherche à faire apprendre au programme, et également selon le cadre dans lequel il doit apprendre, ce qui constitue différentes tâches. Les mesures de similarité jouent un rôle prépondérant dans la plupart de ces tâches, c'est pourquoi les travaux de cette thèse se concentrent sur leur étude. Plus particulièrement, nous nous intéressons à la classification de données, qui est une tâche d'apprentissage dit non supervisé, dans lequel le programme doit organiser un ensemble d'objets en plusieurs classes distinctes, de façon à regrouper les objets similaires ensemble. Dans de nombreuses applications, ces objets (des documents par exemple) sont décrits à l'aide de leurs liens à d'autres types d'objets (des mots par exemple), qui peuvent eux-même être classifiés. On parle alors de co-classification, et nous étudions et proposons dans cette thèse des améliorations de l'algorithme de calcul de co-similarités XSim. Nous montrons que ces améliorations permettent d'obtenir de meilleurs résultats que les méthodes de l'état de l'art. De plus, il est fréquent que ces objets soient liés à plus d'un autre type d'objets, les données qui décrivent ces multiples relations entre différents types d'objets sont dites multivues. Les méthodes classiques ne sont généralement pas capables de prendre en compte toutes les informations contenues dans ces données. C'est pourquoi nous présentons dans cette thèse l'algorithme de calcul multivue de similarités MVSim, qui peut être vu comme une extension aux données multivues de l'algorithme XSim. Nous montrons que cette méthode obtient de meilleures performances que les méthodes multivues de l'état de l'art, ainsi que les méthodes monovues, validant ainsi l'apport de l'aspect multivue. Finalement, nous proposons également d'utiliser l'algorithme MVSim pour classifier des données classiques monovues de grandes tailles, en les découpant en différents ensembles. Nous montrons que cette approche permet de gagner en temps de calcul ainsi qu'en taille mémoire nécessaire, tout en dégradant relativement peu la classification par rapport à une approche directe sans découpage.Machine learning consists in conceiving computer programs capable of learning from their environment, or from data. Different kind of learning exist, depending on what the program is learning, or in which context it learns, which naturally forms different tasks. Similarity measures play a predominant role in most of these tasks, which is the reason why this thesis focus on their study. More specifically, we are focusing on data clustering, a so called non supervised learning task, in which the goal of the program is to organize a set of objects into several clusters, in such a way that similar objects are grouped together. In many applications, these objects (documents for instance) are described by their links to other types of objects (words for instance), that can be clustered as well. This case is referred to as co-clustering, and in this thesis we study and improve the co-similarity algorithm XSim. We demonstrate that these improvements enable the algorithm to outperform the state of the art methods. Additionally, it is frequent that these objects are linked to more than one other type of objects, the data that describe these multiple relations between these various types of objects are called multiview. Classical methods are generally not able to consider and use all the information contained in these data. For this reason, we present in this thesis a new multiview similarity algorithm called MVSim, that can be considered as a multiview extension of the XSim algorithm. We demonstrate that this method outperforms state of the art multiview methods, as well as classical approaches, thus validating the interest of the multiview aspect. Finally, we also describe how to use the MVSim algorithm to cluster large-scale single-view data, by first splitting it in multiple subsets. We demonstrate that this approach allows to significantly reduce the running time and the memory footprint of the method, while slightly lowering the quality of the obtained clustering compared to a straightforward approach with no splitting.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF
Coreference Resolution Evaluation Based on Descriptive Specificity
International audienceThis paper introduces a new evaluation method for the coreference resolution task. Considering that coreference resolution is a matter of linking expressions to discourse referents, we set our evaluation criteron in terms of an evaluation of the denotations assigned to the expressions. This criterion requires that the coreference chains identified in one annotation stand in a one-to-one correspondence with the coreference chains in the other. To determine this correspondence and with a view to keep closer to what human interpretation of the coreference chains would be, we take into account the fact that, in a coreference chain, some expressions are more specific to their referent than others. With this observation in mind, we measure the similarity between the chains in one annotation and the chains in the other, and then compute the optimal similarity between the two annotations. Evaluation then consists in checking whether the denotations assigned to the expressions are correct or not. New measures to analyse errors are also introduced. A comparison with other methods is given at the end of the paper
Towards a Framework for Semantic Exploration of Frequent Patterns
http://ceur-ws.org/Vol-1075/ - ISSN: 1613-0073International audienceMining frequent patterns is an essential task in discovering hidden correlations in datasets. Although frequent patterns unveil valuable information, there are some challenges which limits their usability. First, the number of possible patterns is often very large which hinders their eff ective exploration. Second, patterns with many items are hard to read and the analyst may be unable to understand their meaning. In addition, the only available information about patterns is their support, a very coarse piece of information. In this paper, we are particularly interested in mining datasets that reflect usage patterns of users moving in space and time and for whom demographics attributes are available (age, occupation, etc). Such characteristics are typical of data collected from smart phones, whose analysis has critical business applications nowadays. We propose pattern exploration primitives, abstraction and refinement, that use hand-crafted taxonomies on time, space and user demographics. We show on two real datasets, Nokia and MovieLens, how the use of such taxonomies reduces the size of the pattern space and how demographics enable their semantic exploration. This work opens new perspectives in the semantic exploration of frequent patterns that reflect the behavior of di fferent user communities
Data Science
International audienceLa data science, ou science des données, est la discipline qui traite de la collecte, de la préparation, de la gestion, de l'analyse, de l'interprétation et de la visualisation de grands ensembles de données complexes. Elle n'est pas seulement concernée par les outils et les méthodes pour obtenir, gérer et analyser les données ; elle consiste aussi à en extraire de la valeur et de la connaissance. Cet ouvrage présente les fondements scientifiques et les composantes essentielles de la science des données, à un niveau accessible aux étudiants de master et aux élèves ingénieurs. Notre souci a été de proposer un exposé cohérent reliant la théorie aux algorithmes développés dans ces domaines. Il s'adresse aux chercheurs et ingénieurs qui abordent les problématiques liées à la science des données, aux data scientists de PME qui utilisent en profondeur les outils d'apprentissage, mais aussi aux étudiants de master, doctorants ou encore futurs ingénieurs qui souhaitent un ouvrage de référence en data science. À qui s'adresse ce livre ? • Aux développeurs, statisticiens, étudiants et chefs de projets ayant à résoudre des problèmes de data science. • Aux data scientists, mais aussi à toute personne curieuse d'avoir une vue d'ensemble de l'état de l'art du machine learning
- …