455,356 research outputs found
Finding similar research papers using language models
The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers' abstract, and some additional features such as authors, keywords, and journal. Our work explores the possibility of adapting language modeling techniques to this end. The basic strategy we pursue is to augment the information contained in the abstract by interpolating the corresponding language model with language models for the authors, keywords and journal of the paper. This strategy is then extended by finding topics and additionally interpolating with the resulting topic models. These topics are found using an adaptation of Latent Dirichlet Allocation (LDA), in which the keywords that were provided by the authors are used to guide the process
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
Metamodel Instance Generation: A systematic literature review
Modelling and thus metamodelling have become increasingly important in
Software Engineering through the use of Model Driven Engineering. In this paper
we present a systematic literature review of instance generation techniques for
metamodels, i.e. the process of automatically generating models from a given
metamodel. We start by presenting a set of research questions that our review
is intended to answer. We then identify the main topics that are related to
metamodel instance generation techniques, and use these to initiate our
literature search. This search resulted in the identification of 34 key papers
in the area, and each of these is reviewed here and discussed in detail. The
outcome is that we are able to identify a knowledge gap in this field, and we
offer suggestions as to some potential directions for future research.Comment: 25 page
Language classification from bilingual word embedding graphs
We study the role of the second language in bilingual word embeddings in
monolingual semantic evaluation tasks. We find strongly and weakly positive
correlations between down-stream task performance and second language
similarity to the target language. Additionally, we show how bilingual word
embeddings can be employed for the task of semantic language classification and
that joint semantic spaces vary in meaningful ways across second languages. Our
results support the hypothesis that semantic language similarity is influenced
by both structural similarity as well as geography/contact.Comment: To be published at Coling 201
- …