1,878 research outputs found
UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering
In this work we describe the system built for the three English subtasks of
the SemEval 2016 Task 3 by the Department of Computer Science of the University
of Houston (UH) and the Pattern Recognition and Human Language Technology
(PRHLT) research center - Universitat Polit`ecnica de Val`encia: UH-PRHLT. Our
system represents instances by using both lexical and semantic-based similarity
measures between text pairs. Our semantic features include the use of
distributed representations of words, knowledge graphs generated with the
BabelNet multilingual semantic network, and the FrameNet lexical database.
Experimental results outperform the random and Google search engine baselines
in the three English subtasks. Our approach obtained the highest results of
subtask B compared to the other task participants.Comment: Top system for question-question similarity in SemEval 2016 Task
Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding
Developers increasingly rely on text matching tools to analyze the relation
between natural language words and APIs. However, semantic gaps, namely textual
mismatches between words and APIs, negatively affect these tools. Previous
studies have transformed words or APIs into low-dimensional vectors for
matching; however, inaccurate results were obtained due to the failure of
modeling words and APIs simultaneously. To resolve this problem, two main
challenges are to be addressed: the acquisition of massive words and APIs for
mining and the alignment of words and APIs for modeling. Therefore, this study
proposes Word2API to effectively estimate relatedness of words and APIs.
Word2API collects millions of commonly used words and APIs from code
repositories to address the acquisition challenge. Then, a shuffling strategy
is used to transform related words and APIs into tuples to address the
alignment challenge. Using these tuples, Word2API models words and APIs
simultaneously. Word2API outperforms baselines by 10%-49.6% of relatedness
estimation in terms of precision and NDCG. Word2API is also effective on
solving typical software tasks, e.g., query expansion and API documents
linking. A simple system with Word2API-expanded queries recommends up to 21.4%
more related APIs for developers. Meanwhile, Word2API improves comparison
algorithms by 7.9%-17.4% in linking questions in Question&Answer communities to
API documents.Comment: accepted by IEEE Transactions on Software Engineerin
Enriching Word Vectors with Subword Information
Continuous word representations, trained on large unlabeled corpora are
useful for many natural language processing tasks. Popular models that learn
such representations ignore the morphology of words, by assigning a distinct
vector to each word. This is a limitation, especially for languages with large
vocabularies and many rare words. In this paper, we propose a new approach
based on the skipgram model, where each word is represented as a bag of
character -grams. A vector representation is associated to each character
-gram; words being represented as the sum of these representations. Our
method is fast, allowing to train models on large corpora quickly and allows us
to compute word representations for words that did not appear in the training
data. We evaluate our word representations on nine different languages, both on
word similarity and analogy tasks. By comparing to recently proposed
morphological word representations, we show that our vectors achieve
state-of-the-art performance on these tasks.Comment: Accepted to TACL. The two first authors contributed equall
Pitfall of Google Tri-Grams Word Similarity Measure
This paper describes and examines Google Trigram word similarity based on Google n-gram dataset. Google Tri-grams Measure (GTM) is an unsupervised similarity measurement technique. The paper investigates GTM’s word similarity measure which is the state-of-the art of the measure and we eventually reveal its pitfall. We test the word similarity with MC-30 word pair dataset and compare the result against the other word similarity measures. After evaluation, GTM word similarity measures is found significantly fall behind other word similarity measure. The pitfall of GTM word similarity is detailed and proved with evidences
An Ensemble Method to Produce High-Quality Word Embeddings (2016)
A currently successful approach to computational semantics is to represent
words as embeddings in a machine-learned vector space. We present an ensemble
method that combines embeddings produced by GloVe (Pennington et al., 2014) and
word2vec (Mikolov et al., 2013) with structured knowledge from the semantic
networks ConceptNet (Speer and Havasi, 2012) and PPDB (Ganitkevitch et al.,
2013), merging their information into a common representation with a large,
multilingual vocabulary. The embeddings it produces achieve state-of-the-art
performance on many word-similarity evaluations. Its score of on
an evaluation of rare words (Luong et al., 2013) is 16% higher than the
previous best known system.Comment: Corrected author name, revised reproducibility instructions that
didn't work anymore. 12 pages, 3 figure
Evaluation of sentence embeddings in downstream and linguistic probing tasks
Despite the fast developmental pace of new sentence embedding methods, it is
still challenging to find comprehensive evaluations of these different
techniques. In the past years, we saw significant improvements in the field of
sentence embeddings and especially towards the development of universal
sentence encoders that could provide inductive transfer to a wide variety of
downstream tasks. In this work, we perform a comprehensive evaluation of recent
methods using a wide variety of downstream and linguistic feature probing
tasks. We show that a simple approach using bag-of-words with a recently
introduced language model for deep context-dependent word embeddings proved to
yield better results in many tasks when compared to sentence encoders trained
on entailment datasets. We also show, however, that we are still far away from
a universal encoder that can perform consistently across several downstream
tasks.Comment: 15 pages, 3 figures, 11 table
Learning Word Relatedness over Time
Search systems are often focused on providing relevant results for the "now",
assuming both corpora and user needs that focus on the present. However, many
corpora today reflect significant longitudinal collections ranging from 20
years of the Web to hundreds of years of digitized newspapers and books.
Understanding the temporal intent of the user and retrieving the most relevant
historical content has become a significant challenge. Common search features,
such as query expansion, leverage the relationship between terms but cannot
function well across all times when relationships vary temporally. In this
work, we introduce a temporal relationship model that is extracted from
longitudinal data collections. The model supports the task of identifying,
given two words, when they relate to each other. We present an algorithmic
framework for this task and show its application for the task of query
expansion, achieving high gain.Comment: 11 pages, EMNLP 201
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings
Despite advances in open-domain dialogue systems, automatic evaluation of
such systems is still a challenging problem. Traditional reference-based
metrics such as BLEU are ineffective because there could be many valid
responses for a given context that share no common words with reference
responses. A recent work proposed Referenced metric and Unreferenced metric
Blended Evaluation Routine (RUBER) to combine a learning-based metric, which
predicts relatedness between a generated response and a given query, with
reference-based metric; it showed high correlation with human judgments. In
this paper, we explore using contextualized word embeddings to compute more
accurate relatedness scores, thus better evaluation metrics. Experiments show
that our evaluation metrics outperform RUBER, which is trained on static
embeddings.Comment: 8 pages, 2 figures, NAACL 2019 Methods for Optimizing and Evaluating
Neural Language Generation (NeuralGen workshop
Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change
Words shift in meaning for many reasons, including cultural factors like new
technologies and regular linguistic processes like subjectification.
Understanding the evolution of language and culture requires disentangling
these underlying causes. Here we show how two different distributional measures
can be used to detect two different types of semantic change. The first
measure, which has been used in many previous works, analyzes global shifts in
a word's distributional semantics, it is sensitive to changes due to regular
processes of linguistic drift, such as the semantic generalization of promise
("I promise." -> "It promised to be exciting."). The second measure, which we
develop here, focuses on local changes to a word's nearest semantic neighbors;
it is more sensitive to cultural shifts, such as the change in the meaning of
cell ("prison cell" -> "cell phone"). Comparing measurements made by these two
methods allows researchers to determine whether changes are more cultural or
linguistic in nature, a distinction that is essential for work in the digital
humanities and historical linguistics.Comment: 5 pages, 3 figures, EMNLP 201
ExpertSeer: a Keyphrase Based Expert Recommender for Digital Libraries
We describe ExpertSeer, a generic framework for expert recommendation based
on the contents of a digital library. Given a query term q, ExpertSeer
recommends experts of q by retrieving authors who published relevant papers
determined by related keyphrases and the quality of papers. The system is based
on a simple yet effective keyphrase extractor and the Bayes' rule for expert
recommendation. ExpertSeer is domain independent and can be applied to
different disciplines and applications since the system is automated and not
tailored to a specific discipline. Digital library providers can employ the
system to enrich their services and organizations can discover experts of
interest within an organization. To demonstrate the power of ExpertSeer, we
apply the framework to build two expert recommender systems. The first, CSSeer,
utilizes the CiteSeerX digital library to recommend experts primarily in
computer science. The second, ChemSeer, uses publicly available documents from
the Royal Society of Chemistry (RSC) to recommend experts in chemistry. Using
one thousand computer science terms as benchmark queries, we compared the top-n
experts (n=3, 5, 10) returned by CSSeer to two other expert recommenders --
Microsoft Academic Search and ArnetMiner -- and a simulator that imitates the
ranking function of Google Scholar. Although CSSeer, Microsoft Academic Search,
and ArnetMiner mostly return prestigious researchers who published several
papers related to the query term, it was found that different expert
recommenders return moderately different recommendations. To further study
their performance, we obtained a widely used benchmark dataset as the ground
truth for comparison. The results show that our system outperforms Microsoft
Academic Search and ArnetMiner in terms of Precision-at-k (P@k) for k=3, 5, 10.
We also conducted several case studies to validate the usefulness of our
system
- …