Search CORE

58 research outputs found

Spectral Estimation of Conditional Random Graph Models for Large-Scale Network Data

Author: Freno Antonino
Garriga Gemma C.
Keller Mikaela
Tommasi Marc
Publication venue
Publication date: 01/01/2012
Field of study

Generative models for graphs have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are either only suitable for characterizing some particular network properties (such as degree distribution or clustering coefficient), or they are aimed at estimating joint probability distributions, which is often intractable in large-scale networks. In this paper, we first propose a novel network statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random graph model, switching the focus from the estimation of joint probability distributions to a more tractable conditional estimation setting. After analyzing the dependence structure characterizing Fiedler random graphs, we evaluate them experimentally in edge prediction over several real-world networks, showing that they allow to reach a much higher prediction accuracy than various alternative statistical models.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Learning to Recommend Links using Graph Structure and Node Content

Author: Freno Antonino
Garriga Gemma,
Keller Mikaela
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceThe link prediction problem for graphs is a binary classification task that estimates the presence or absence of a link between two nodes in the graph. Links absent from the training set, however, cannot be directly considered as the negative examples since they might be present links at test time. Finding a hard decision boundary for link prediction is thus unnatural. This paper formalizes the link prediction problem from the flexible perspective of preference learning: the goal is to learn a preference score between any two nodes---either observed in the network at training time or to appear only later in the test---by using the feature vectors of the nodes and the structure of the graph as side information. Our assumption is that the observed edges, and in general, shortest paths between nodes in the graph, can reinforce an existing similarity between the nodes feature vectors. We propose a model implemented by a simple neural network architecture and an objective function that can be optimized by stochastic gradient descent over appropriate triplets of nodes in the graph. Our first preliminary experiments in small undirected graphs show that our learning algorithm outperforms baselines in real networks and is able to learn the correct distance function in synthetic networks

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

Author: Freno Antonino
Keller Mikaela
Tommasi Marc
Publication venue: 'MIT Press - Journals'
Publication date: 01/12/2012
Field of study

International audienceStatistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches

HAL - Lille 3

INRIA a CCSD electronic archive server

A Tale of Two Laws of Semantic Change: Predicting Synonym Changes with Distributional Semantic Models

Author: Denis Pascal
Keller Mikaela
Liétard Bastien
Publication venue
Publication date: 30/05/2023
Field of study

Lexical Semantic Change is the study of how the meaning of words evolves through time. Another related question is whether and how lexical relations over pairs of words, such as synonymy, change over time. There are currently two competing, apparently opposite hypotheses in the historical linguistic literature regarding how synonymous words evolve: the Law of Differentiation (LD) argues that synonyms tend to take on different meanings over time, whereas the Law of Parallel Change (LPC) claims that synonyms tend to undergo the same semantic change and therefore remain synonyms. So far, there has been little research using distributional models to assess to what extent these laws apply on historical corpora. In this work, we take a first step toward detecting whether LD or LPC operates for given word pairs. After recasting the problem into a more tractable task, we combine two linguistic resources to propose the first complete evaluation framework on this problem and provide empirical evidence in favor of a dominance of LD. We then propose various computational approaches to the problem using Distributional Semantic Models and grounded in recent literature on Lexical Semantic Change detection. Our best approaches achieve a balanced accuracy above 0.6 on our dataset. We discuss challenges still faced by these approaches, such as polysemy or the potential confusion between synonymy and hypernymy.Comment: Accepted at The 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023

arXiv.org e-Print Archive

Recommended from our members

Automated Vocabulary Discovery for Geo-Parsing Online Epidemic Intelligence

Author: Brownstein John Samuel
Freifeld Clark C
Keller Mikaela
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/02/2012
Field of study

Background Automated surveillance of the Internet provides a timely and sensitive method for alerting on global emerging infectious disease threats. HealthMap is part of a new generation of online systems designed to monitor and visualize, on a real-time basis, disease outbreak alerts as reported by online news media and public health sources. HealthMap is of specific interest for national and international public health organizations and international travelers. A particular task that makes such a surveillance useful is the automated discovery of the geographic references contained in the retrieved outbreak alerts. This task is sometimes referred to as "geo-parsing". A typical approach to geo-parsing would demand an expensive training corpus of alerts manually tagged by a human.Results Given that human readers perform this kind of task by using both their lexical and contextual knowledge, we developed an approach which relies on a relatively small expert-built gazetteer, thus limiting the need of human input, but focuses on learning the context in which geographic references appear. We show in a set of experiments, that this approach exhibits a substantial capacity to discover geographic locations outside of its initial lexicon.Conclusion The results of this analysis provide a framework for future automated global surveillance efforts that reduce manual input and improve timeliness of reporting

Harvard University - DASH

Textual Data Representation

Author: Bengio Samy
Keller Mikaela
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

A Multitask Learning Approach to Document Representation using Unlabeled Data

Author: Bengio Samy
Keller Mikaela
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

Text categorization is intrinsically a supervised learning task, which aims at relating a given text document to one or more predefined categories. Unfortunately, labeling such databases of documents is a painful task. We present in this paper a method that takes advantage of huge amounts of unlabeled text documents available in digital format, to counter balance the relatively smaller available amount of labeled text documents. A Siamese MLP is trained in a multi-task framework in order to solve two concurrent tasks: using the unlabeled data, we search for a mapping from the documents' bag-of-word representation to a new feature space emphasizing similarities and dissimilarities among documents; simultaneously, this mapping is constrained to also give good text categorization performance over the labeled dataset. Experimental results on Reuters RCV1 suggest that, as expected, performance over the labeled task increases as the amount of unlabeled data increases

Infoscience - École polytechnique fédérale de Lausanne

Theme Topic Mixture Model: A Graphical Model for Document Representation

Author: Bengio Samy
Keller Mikaela
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

Documents are usually represented in the bag-of-word space. However, this representation does not take into account the possible relations between words. We propose here a graphical model for representing documents: the Theme Topic Mixture Model (TTMM). This model assumes two types of relations among textual data. Topics link words to each other and Themes gather documents with particular distribution over the topics. This paper defines the TTMM, compares it to the related Latent Dirichlet Allocation (LDA) model (Blei, 2003) and reports some interesting empirical results

Infoscience - École polytechnique fédérale de Lausanne

Facing the facts of fake: a distributional semantics and corpus annotation approach

Author: Cappelle Bert
Denis Pascal
Keller Mikaela
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 28/11/2018
Field of study

International audienceFake is often considered the textbook example of a so-called 'privative' adjective, one which, in other words, allows the proposition that '(a) fake x is not (an) x'. This study tests the hypothesis that the contexts of an adjective-noun combination are more different from the contexts of the noun when the adjective is such a 'privative' one than when it is an ordinary (subsective) one. We here use 'embeddings', that is, dense vector representations based on word co-occurrences in a large corpus, which in our study is the entire English Wikipedia as it was in 2013. Comparing the cosine distance between the adjective-noun bigram and single noun embeddings across two sets of adjectives, privative and ordinary ones, we fail to find a noticeable difference. However, we contest that fake is an across-the-board privative adjective, since a fake article, for instance, is most definitely still an article. We extend a recent proposal involving the noun's qualia roles (how an entity is made, what it consists of, what it is used for, etc.) and propose several interpretational types of fake-noun combinations, some but not all of which are privative. These interpretations, which we assign manually to the 100 most frequent fake-noun combinations in the Wikipedia corpus, depend to a large extent on the meaning of the noun, as combinations with similar interpretations tend to involve nouns that are linked in a distributions-based network. When we restrict our focus to the privative uses of fake only, we do detect a slightly enlarged difference between fake + noun bigram and noun distributions compared to the previously obtained average difference between adjective + noun bigram and noun distributions. This result contrasts with negative or even opposite findings reported in the literature

INRIA a CCSD electronic archive server