Search CORE

29 research outputs found

Co-occurrence Vectors from Corpora vs. Distance Vectors from Dictionaries

Author: Nitta Yoshihiko
Niwa Yoshiki
Publication venue
Publication date: 01/01/1994
Field of study

A comparison was made of vectors derived by using ordinary co-occurrence statistics from large text corpora and of vectors derived by measuring the inter-word distances in dictionary definitions. The precision of word sense disambiguation by using co-occurrence vectors from the 1987 Wall Street Journal (20M total words) was higher than that by using distance vectors from the Collins English Dictionary (60K head words + 1.6M definition words). However, other experimental results suggest that distance vectors contain some different semantic information from co-occurrence vectors.Comment: 6 pages, appeared in the Proc. of COLING94 (pp. 304-309)

arXiv.org e-Print Archive

CiteSeerX

Crossref

SEWordSim: Software-Specific Word Similarity Database

Author: Lawall Julia
LO David
TIAN Yuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/05/2014
Field of study

International audienceMeasuring the similarity of words is important in accurately representing and comparing documents, and thus improves the results of many natural language processing (NLP) tasks. The NLP community has proposed various measurements based on WordNet, a lexical database that contains relationships between many pairs of words. Recently, a number of techniques have been proposed to address software engineering issues such as code search and fault localization that require understanding natural language documents, and a measure of word similarity could improve their results. However, WordNet only contains information about words senses in general-purpose conversation, which often differ from word senses in a software-engineering context, and the software-specific word similarity resources that have been developed rely on data sources containing only a limited range of words and word uses.In recent work, we have proposed a word similarity resource based on information collected automatically from StackOverflow. We have found that the results of this resource are given scores on a 3-point Likert scale that are over 50% higher than the results of a resource based on WordNet. In this demo paper, we review our data collection methodology and propose a Java API to make the resulting word similarity resource useful in practice.The SEWordSim database and related information can be found at http://goo.gl/BVEAs8. Demo video is available at http://goo.gl/dyNwyb

Crossref

Institutional Knowledge at Singapore Management University

INRIA a CCSD electronic archive server

ItEM: A Vector Space Model to Bootstrap an Italian Emotive Lexicon

Author: Lenci Alessandro
Passaro Lucia
Pollacci Laura
Publication venue: Academia University Press
Publication date: 01/01/2015
Field of study

In recent years computational linguistics has seen a rising interest in subjectivity, opinions, feelings and emotions. Even though great attention has been given to polarity recognition, the research in emotion detection has had to rely on small emotion resources. In this paper, we present a methodology to build emotive lexicons by jointly exploiting vector space models and human annotation, and we provide the first results of the evaluation with a crowdsourcing experiment

Crossref

Archivio della Ricerca - Università di Pisa

The Distributional Hypothesis

Author: Sahlgren Magnus
Publication venue
Publication date: 01/01/2008
Field of study

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database