102 research outputs found
Commonsense Metaphysics and Lexical Semantics
In the TACITUS project for using commonsense knowledge in the understanding of texts about mechanical devices and their failures, we have been developing various commonsense theories that are needed to mediate between the way we talk about the behavior of such devices and causal models of their operation. Of central importance in this effort is the axiomatization of what might be called commonsense metaphysics. This includes a number of areas that figure in virtually every domain of discourse, such as granularity, scales, time, space, material, physical objects, shape, causality, functionality, and force. Our effort has been to construct core theories of each of these areas, and then to define, or at least characterize, a large number of lexical items in terms provided by the core theories. In this paper we discuss our methodological principles and describe the key ideas in the various domains we are investigating
One model, two languages: training bilingual parsers with harmonized treebanks
We introduce an approach to train lexicalized parsers using bilingual corpora
obtained by merging harmonized treebanks of different languages, producing
parsers that can analyze sentences in either of the learned languages, or even
sentences that mix both. We test the approach on the Universal Dependency
Treebanks, training with MaltParser and MaltOptimizer. The results show that
these bilingual parsers are more than competitive, as most combinations not
only preserve accuracy, but some even achieve significant improvements over the
corresponding monolingual parsers. Preliminary experiments also show the
approach to be promising on texts with code-switching and when more languages
are added.Comment: 7 pages, 4 tables, 1 figur
Towards Syntactic Iberian Polarity Classification
Lexicon-based methods using syntactic rules for polarity classification rely
on parsers that are dependent on the language and on treebank guidelines. Thus,
rules are also dependent and require adaptation, especially in multilingual
scenarios. We tackle this challenge in the context of the Iberian Peninsula,
releasing the first symbolic syntax-based Iberian system with rules shared
across five official languages: Basque, Catalan, Galician, Portuguese and
Spanish. The model is made available.Comment: 7 pages, 5 tables. Contribution to the 8th Workshop on Computational
Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA-2017)
at EMNLP 201
MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach
Entity linking has recently been the subject of a significant body of
research. Currently, the best performing approaches rely on trained
mono-lingual models. Porting these approaches to other languages is
consequently a difficult endeavor as it requires corresponding training data
and retraining of the models. We address this drawback by presenting a novel
multilingual, knowledge-based agnostic and deterministic approach to entity
linking, dubbed MAG. MAG is based on a combination of context-based retrieval
on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data
sets and in 7 languages. Our results show that the best approach trained on
English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse
on datasets in other languages. MAG, on the other hand, achieves
state-of-the-art performance on English datasets and reaches a micro F-measure
that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc
- …