Search CORE

18 research outputs found

Kernel Methods for Minimally Supervised WSD

Author: Giuliano Claudio
Gliozzo Alfio Massimiliano
Strapparava Carlo
Publication venue
Publication date: 01/01/2009
Field of study

We present a semi-supervised technique for word sense disambiguation that exploits external knowledge acquired in an unsupervised manner. In particular, we use a combination of basic kernel functions to independently estimate syntagmatic and domain similarity, building a set of word-expert classifiers that share a common domain model acquired from a large corpus of unla- beled data. The results show that the proposed approach achieves state-of-the-art performance on a wide range of lexical sample tasks and on the English all-words task of Senseval-3, although it uses a considerably smaller number of training examples than other methods

Archivio della ricerca - Fondazione Bruno Kessler

Ruolo dei campi semantici nella struttura di un lessico computazionale: utilizzo per la disambiguazione automatica di senso

Author: Gliozzo Alfio Massimiliano
Publication venue
Publication date: 01/01/2002
Field of study

Archivio della ricerca - Fondazione Bruno Kessler

Cross language text categorization by acquiring multilingual domain models from comparable corpora

Author: Gliozzo Alfio Massimiliano
Strapparava Carlo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2005
Field of study

In a multilingual scenario, the classical monolingual text categorization problem can be reformulated as a cross language TC task, in which we have to cope with two or more languages (e.g. English and Italian). In this setting, the system is trained using labeled examples in a source language (e.g. English), and it classifies documents in a different target language (e.g. Italian). In this paper we propose a novel approach to solve the cross language text categorization problem based on acquiring Multilingual Domain Models from comparable corpora in a totally unsupervised way and without using any external knowledge source (e.g. bilingual dictionaries). These Multilingual Domain Models are exploited to define a generalized similarity function (i.e. a kernel function) among documents in different languages, which is used inside a Support Vector Machines classification framework. The results show that our approach is a feasible and cheap solution that largely outperforms a baseline

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Instance Based Lexical Entailment for Ontology Population

Author: Giuliano Claudio
Gliozzo Alfio Massimiliano
Publication venue
Publication date: 01/01/2007
Field of study

In this paper we propose an instance based method for lexical entailment and apply it to automatic ontology population from text. The approach is fully unsupervised and based on kernel methods. We demonstrate the effectiveness of our technique largely surpassing both the random and most frequent baselines and outperforming current state-of-the-art unsupervised approaches on a benchmark ontology available in the literature

Archivio della ricerca - Fondazione Bruno Kessler

Semantic Domains and Supersense Tagging for Domain-Specific Ontology Learning.

Author: Ciaramita Massimiliano
Gliozzo Alfio
Picca Davide
Publication venue
Publication date: 01/01/2007
Field of study

Serveur académique lausannois

Instance Pruning by Filtering Uninformative Words: an Information Extraction Case Study

Author: Giuliano Claudio
Gliozzo Alfio Massimiliano
R. Rinaldi
Publication venue: country:DEU
Publication date: 01/01/2005
Field of study

In this paper we present a novel instance pruning technique for Information Extraction (IE). In particular, our technique filters out uninformative words from texts on the basis of the assumption that very frequent words in the language do not provide any specific information about the text in which they appear, therefore their expectation of being (part of) relevant entities is very low. The experiments on two benchmark datasets show that the computation time can be significantly reduced without any significant decrease in the prediction accuracy. We also report an improvement in accuracy for one task

Archivio della ricerca - Fondazione Bruno Kessler

Domain kernels for word sense disambiguation

Author: Giuliano Claudio
Gliozzo Alfio Massimiliano
Strapparava Carlo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2005
Field of study

In this paper we present a supervised Word Sense Disambiguation methodology, that exploits kernel methods to model sense distinctions. In particular a combination of kernel functions is adopted to estimate independently both syntagmatic and domain similarity. We defined a kernel function, namely the Domain Kernel, that allowed us to plug ``external knowledge` into the supervised learning process. External knowledge is acquired from unlabeled data in a totally unsupervised way, and it is represented by means of Domain Models. We evaluated our methodology on several lexical sample tasks in different languages, outperforming significantly the state-of-the-art for each of them, while reducing the amount of labeled training data required for learning

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Bridging Languages by SuperSense Entity Tagging

Author: Alfio Massimiliano Gliozzo
Davide Picca
Simone Campora
Publication venue
Publication date: 01/01/2009
Field of study

This paper explores a very basic linguistic phenomenon in multilingualism: the lexicalizations of entities are very often identical within different languages while concepts are usually lexicalized differently. Since entities are commonly referred to by proper names in natural language, we measured their distribution in the lexical overlap of the terminologies extracted from comparable corpora. Results show that the lexical overlap is mostly composed by unambiguous words, which can be regarded as anchors to bridge languages: most of terms having the same spelling refer exactly to the same entities. Thanks to this important feature of Named Entities, we developed a multilingual super sense tagging system capable to distinguish between concepts and individuals. Individuals adopted for training have been extracted both by YAGO and by a heuristic procedure. The general F1 of the English tagger is over 76%, which is in line with the state of the art on super sense tagging while augmenting the number of classes. Performances for Italian are slightly lower, while ensuring a reasonable accuracy level which is capable to show effective results for knowledge acquisition.

CiteSeerX

Serveur académique lausannois

Unsupervised Part-Of-Speech Tagging Supporting Supervised Methods

Author: Alfio Massimiliano Gliozzo
Christian Biemann
Claudio Giuliano
Publication venue
Publication date
Field of study

This paper investigates the utility of an unsupervised part-of-speech (PoS) system in a task oriented way. We use PoS labels as features for different supervised NLP tasks: Word Sense Disambiguation, Named Entity Recognition and Chunking. Further we explore, how much supervised tagging can gain from unsupervised tagging. A comparative evaluation between variants of systems using standard PoS, unsupervised PoS and no PoS at all reveals that Supervised tagging gains substantially from unsupervised tagging. In particular unsupervised PoS tagging behaves similarly to supervised PoS in Word Sense Disambiguation and Named Entity Recognition, while only chunking still benefit more from Supervised PoS. Overall results indicate that unsupervised PoS tagging is useful for many applications and a veritable low-cost alternative, if none or very little PoS training data is available for the target language or domain

Archivio della ricerca - Fondazione Bruno Kessler