70 research outputs found
Exploiting Transitivity in Probabilistic Models for Ontology Learning
Capturing word meaning is one of the challenges of natural language processing (NLP). Formal models of meaning such as ontologies are knowledge repositories used in a variety of applications. To be effectively used, these ontologies have to be large or, at least, adapted to specific domains. Our main goal is to contribute practically to the research on ontology learning models by covering different aspects of the task.
We propose probabilistic models for learning ontologies that expands existing ontologies taking into accounts both corpus-extracted evidences and structure of the generated ontologies. The model exploits structural properties of target relations such as transitivity during learning. We then propose two extensions of our probabilistic models: a model for learning from a generic domain that can be exploited to extract new information in a specific domain and an incremental ontology learning system that put human validations in the learning loop. This latter provides a graphical user interface and a human-computer interaction workflow supporting the incremental leaning loop
Automatic learning of textual entailments with cross-pair similarities
In this paper we define a novel similarity
measure between examples of textual entailments and we use it as a kernel function in Support Vector Machines (SVMs).
This allows us to automatically learn the
rewrite rules that describe a non trivial set
of entailment cases. The experiments with
the data sets of the RTE 2005 challenge
show an improvement of 4.4% over the state-of-the-art methods
Fast and effective kernels for relational learning from texts
In this paper, we define a family of syntactic kernels for automatic relational learning from pairs of natural language sentences. We provide an efficient computation of such models by optimizing
the dynamic programming algorithm of
the kernel evaluation. Experiments with Support Vector Machines and the above kernels show the effectiveness and efficiency of our approach on two very important natural language tasks, Textual
Entailment Recognition and Question Answering
Inductive probabilistic taxonomy learning using singular value decomposition
Capturing word meaning is one of the challenges of natural language processing (NLP).
Formal models of meaning, such as networks of words or concepts, are knowledge repositories
used in a variety of applications. To be effectively used, these networks have to be large or, at
least, adapted to specific domains. Learning word meaning from texts is then an active area
of research. Lexico-syntactic pattern methods are one of the possible solutions. Yet, these
models do not use structural properties of target semantic relations, e.g. transitivity, during
learning. In this paper, we propose a novel lexico-syntactic pattern probabilistic method
for learning taxonomies that explicitly models transitivity and naturally exploits vector space
model techniques for reducing space dimensions. We define two probabilistic models: the
direct probabilistic model and the induced probabilistic model. The first is directly estimated
on observations over text collections. The second uses transitivity on the direct probabilistic
model to induce probabilities of derived events. Within our probabilistic model, we also
propose a novel way of using singular value decomposition as unsupervised method for
feature selection in estimating direct probabilities. We empirically show that the induced
probabilistic taxonomy learning model outperforms state-of-the-art probabilistic models and
our unsupervised feature selection method improves performance
Linguistic redundancy in Twitter
In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of micro-blogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual En-tailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance
Terminology extraction: an analysis of linguistic and statistical approaches
Are linguistic properties and behaviors important to recognize terms? Are statistical measures effective to extract terms? Is it possible to capture a sort of termhood with computation linguistic techniques? Or maybe, terms are too much sensitive to exogenous and pragmatic factors that cannot be confined in computational linguistic? All these questions are still open. This study tries to contribute in the search of an answer, with the belief that it can be found only through a careful experimental analysis of real case studies and a study of their correlation with theoretical insights
A Machine learning approach to textual entailment recognition
Designing models for learning textual entailment recognizers from annotated examples is not an easy task, as it requires modeling the semantic relations and interactions involved between two pairs of text fragments. In this paper, we approach the problem by first introducing the class of pair feature spaces, which allow supervised machine learning algorithms to derive first-order rewrite rules from annotated examples. In particular, we propose syntactic and shallow semantic feature spaces, and compare them to standard ones. Extensive experiments demonstrate that our proposed spaces learn first-order derivations, while standard ones are not expressive enough to do so
- …