1,219 research outputs found
The Most Influential Paper Gerard Salton Never Wrote
Gerard Salton is often credited with developing the vector space model
(VSM) for information retrieval (IR). Citations to Salton give the impression
that the VSM must have been articulated as an IR model sometime between
1970 and 1975. However, the VSM as it is understood today evolved over a
longer time period than is usually acknowledged, and an articulation of the
model and its assumptions did not appear in print until several years after
those assumptions had been criticized and alternative models proposed. An
often cited overview paper titled ???A Vector Space Model for Information
Retrieval??? (alleged to have been published in 1975) does not exist, and
citations to it represent a confusion of two 1975 articles, neither of which
were overviews of the VSM as a model of information retrieval. Until the
late 1970s, Salton did not present vector spaces as models of IR generally
but rather as models of specifi c computations. Citations to the phantom
paper refl ect an apparently widely held misconception that the operational
features and explanatory devices now associated with the VSM must have
been introduced at the same time it was fi rst proposed as an IR model.published or submitted for publicatio
Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora
Much of scientific progress stems from previously published findings, but
searching through the vast sea of scientific publications is difficult. We
often rely on metrics of scholarly authority to find the prominent authors but
these authority indices do not differentiate authority based on research
topics. We present Latent Topical-Authority Indexing (LTAI) for jointly
modeling the topics, citations, and topical authority in a corpus of academic
papers. Compared to previous models, LTAI differs in two main aspects. First,
it explicitly models the generative process of the citations, rather than
treating the citations as given. Second, it models each author's influence on
citations of a paper based on the topics of the cited papers, as well as the
citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS,
and Citeseer. We compare the performance of LTAI against various baselines,
starting with the latent Dirichlet allocation, to the more advanced models
including author-link topic model and dynamic author citation topic model. The
results show that LTAI achieves improved accuracy over other similar models
when predicting words, citations and authors of publications.Comment: Accepted by Transactions of the Association for Computational
Linguistics (TACL); to appea
SciRecSys: A Recommendation System for Scientific Publication by Discovering Keyword Relationships
In this work, we propose a new approach for discovering various relationships
among keywords over the scientific publications based on a Markov Chain model.
It is an important problem since keywords are the basic elements for
representing abstract objects such as documents, user profiles, topics and many
things else. Our model is very effective since it combines four important
factors in scientific publications: content, publicity, impact and randomness.
Particularly, a recommendation system (called SciRecSys) has been presented to
support users to efficiently find out relevant articles
Towards an Information Retrieval Theory of Everything
I present three well-known probabilistic models of information retrieval in tutorial style: The binary independence probabilistic model, the language modeling approach, and Google's page rank. Although all three models are based on probability theory, they are very different in nature. Each model seems well-suited for solving certain information retrieval problems, but not so useful for solving others. So, essentially each model solves part of a bigger puzzle, and a unified view on these models might be a first step towards an Information Retrieval Theory of Everything
The Structure and Dynamics of Co-Citation Clusters: A Multiple-Perspective Co-Citation Analysis
A multiple-perspective co-citation analysis method is introduced for
characterizing and interpreting the structure and dynamics of co-citation
clusters. The method facilitates analytic and sense making tasks by integrating
network visualization, spectral clustering, automatic cluster labeling, and
text summarization. Co-citation networks are decomposed into co-citation
clusters. The interpretation of these clusters is augmented by automatic
cluster labeling and summarization. The method focuses on the interrelations
between a co-citation cluster's members and their citers. The generic method is
applied to a three-part analysis of the field of Information Science as defined
by 12 journals published between 1996 and 2008: 1) a comparative author
co-citation analysis (ACA), 2) a progressive ACA of a time series of
co-citation networks, and 3) a progressive document co-citation analysis (DCA).
Results show that the multiple-perspective method increases the
interpretability and accountability of both ACA and DCA networks.Comment: 33 pages, 11 figures, 10 tables. To appear in the Journal of the
American Society for Information Science and Technolog
Measuring Author Research Relatedness: A Comparison of Word-based,Topic-based and Author Cocitation Approaches
Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on Latent Dirichlet Allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map
- …