Search CORE

15 research outputs found

Decorrelation and shallow semantic patterns for distributional clustering of nouns and verbs

Author: Versley Yannick
Publication venue
Publication date: 01/01/2009
Field of study

Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Text Classification Using Association Rules, Dependency Pruning and Hyperonymization

Author: Haralambous Yannis
Lenca Philippe
Publication venue
Publication date: 28/07/2014
Field of study

We present new methods for pruning and enhancing item- sets for text classification via association rule mining. Pruning methods are based on dependency syntax and enhancing methods are based on replacing words by their hyperonyms of various orders. We discuss the impact of these methods, compared to pruning based on tfidf rank of words.Comment: 16 pages, 2 figures, presented at DMNLP 201

arXiv.org e-Print Archive

HAL-Université de Bretagne Occidentale

FigShare

Towards Terascale Knowledge Acquisition

Author: Deepak Ravich
Eduard Hovy
Patrick Pantel
Publication venue
Publication date: 01/01/2004
Field of study

Although vast amounts of textual data are freely available, many NLP algorithms exploit only a minute percentage of it. In this paper, we study the challenges of working at the terascale. We present an algorithm, designed for the terascale, for mining is-a relations that achieves similar performance to a state-of-the-art linguistically-rich method. We focus on the accuracy of these two systems as a function of processing time and corpus size.

CiteSeerX

Crossref

Discovering multiword expressions

Author: Aline Villavicencio
Attia
Baldwin
Barrett
Barrett
Biber
Calzolari
Camacho-Collados
Church
Clark
Curran
de Marneffe
Dunning
Firth
Frege
Kilgarriff
Kim
Kiros
Lapesa
Leacock
Lin
Manning
Marco Idiart
McCarthy
Melamed
Mitchell
Moon
Nunberg
Pearce
Peters
Roller
Sag
Salehi
Salehi
Schneider
Schneider
Schulte im Walde
Sporleder
Søgaard
Van de Cruys
Villavicencio
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/11/2019
Field of study

In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural lan- guage processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We con- centrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods

Crossref

White Rose Research Online

Recommended from our members

Distributional similarity for Chinese: Exploiting characters and radicals

Author: Carroll J
Jin P
McCarthy D
Wu Y
Publication venue: Mathematical Problems in Engineering
Publication date: 13/07/2017
Field of study

Distributional Similarity has attracted considerable attention in the field of natural language processing as an automatic means of countering the ubiquitous problem of sparse data. As a logographic language, Chinese words consist of characters and each of them is composed of one or more radicals. The meanings of characters are usually highly related to the words which contain them. Likewise, radicals often make a predictable contribution to the meaning of a character: characters that have the same components tend to have similar or related meanings. In this paper, we utilize these properties of the Chinese language to improve Chinese word similarity computation. Given a content word, we first extract similar words based on a large corpus and a similarity score for ranking. This rank is then adjusted according to the characters and components shared between the similar word and the target word. Experiments on two gold standard datasets show that the adjusted rank is superior and closer to human judgments than the original rank. In addition to quantitative evaluation, we examine the reasons behind errors drawing on linguistic phenomena for our explanations.Peer Reviewe

Apollo (Cambridge)

A Markovian approach to distributional semantics with application to semantic compositionality

Author: Bach Francis
Grave Edouard
Obozinski Guillaume
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceIn this article, we describe a new approach to distributional semantics. This approach relies on a generative model of sentences with latent variables, which takes the syntax into account by using syntactic dependency trees. Words are then represented as posterior distributions over those latent classes, and the model allows to naturally obtain in-context and out-of-context word representations, which are comparable. We train our model on a large corpus and demonstrate the compositionality capabilities of our approach on different datasets

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Analysis and study on text representation to improve the accuracy of the Normalized Compression Distance

Author: Granados Ana
Publication venue
Publication date: 01/01/2012
Field of study

The huge amount of information stored in text form makes methods that deal with texts really interesting. This thesis focuses on dealing with texts using compression distances. More specifically, the thesis takes a small step towards understanding both the nature of texts and the nature of compression distances. Broadly speaking, the way in which this is done is exploring the effects that several distortion techniques have on one of the most successful distances in the family of compression distances, the Normalized Compression Distance -NCD-.Comment: PhD Thesis; 202 page

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CERN Document Server

Biblos-e Archivo

Proceedings of the 6th Dutch-Belgian Information Retrieval Workshop

Author
Publication venue: Neslia Paniculata
Publication date: 01/03/2006
Field of study

University of Twente Research Information