Search CORE

1,219 research outputs found

The Most Influential Paper Gerard Salton Never Wrote

Author: Dubin David
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign.
Publication date: 01/01/2004
Field of study

Gerard Salton is often credited with developing the vector space model (VSM) for information retrieval (IR). Citations to Salton give the impression that the VSM must have been articulated as an IR model sometime between 1970 and 1975. However, the VSM as it is understood today evolved over a longer time period than is usually acknowledged, and an articulation of the model and its assumptions did not appear in print until several years after those assumptions had been criticized and alternative models proposed. An often cited overview paper titled ???A Vector Space Model for Information Retrieval??? (alleged to have been published in 1975) does not exist, and citations to it represent a confusion of two 1975 articles, neither of which were overviews of the VSM as a model of information retrieval. Until the late 1970s, Salton did not present vector spaces as models of IR generally but rather as models of specifi c computations. Citations to the phantom paper refl ect an apparently widely held misconception that the operational features and explanatory devices now associated with the VSM must have been introduced at the same time it was fi rst proposed as an IR model.published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

Author: Kim Dongwoo
Kim Jooyeon
Oh Alice
Publication venue
Publication date: 01/02/2017
Field of study

Much of scientific progress stems from previously published findings, but searching through the vast sea of scientific publications is difficult. We often rely on metrics of scholarly authority to find the prominent authors but these authority indices do not differentiate authority based on research topics. We present Latent Topical-Authority Indexing (LTAI) for jointly modeling the topics, citations, and topical authority in a corpus of academic papers. Compared to previous models, LTAI differs in two main aspects. First, it explicitly models the generative process of the citations, rather than treating the citations as given. Second, it models each author's influence on citations of a paper based on the topics of the cited papers, as well as the citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS, and Citeseer. We compare the performance of LTAI against various baselines, starting with the latent Dirichlet allocation, to the more advanced models including author-link topic model and dynamic author citation topic model. The results show that LTAI achieves improved accuracy over other similar models when predicting words, citations and authors of publications.Comment: Accepted by Transactions of the Association for Computational Linguistics (TACL); to appea

arXiv.org e-Print Archive

ScholarWorks@UNIST

SciRecSys: A Recommendation System for Scientific Publication by Discovering Keyword Relationships

Author: D. Sánchez
F. Fouss
I. Dagan
J. Singthongchai
J.P. Keener
L.T. Kien
P. Lops
S.H. Cha
V. Anh Le
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this work, we propose a new approach for discovering various relationships among keywords over the scientific publications based on a Markov Chain model. It is an important problem since keywords are the basic elements for representing abstract objects such as documents, user profiles, topics and many things else. Our model is very effective since it combines four important factors in scientific publications: content, publicity, impact and randomness. Particularly, a recommendation system (called SciRecSys) has been presented to support users to efficiently find out relevant articles

arXiv.org e-Print Archive

Crossref

Towards an Information Retrieval Theory of Everything

Author: Hiemstra Djoerd
Publication venue: NVTI
Publication date: 01/01/2009
Field of study

I present three well-known probabilistic models of information retrieval in tutorial style: The binary independence probabilistic model, the language modeling approach, and Google's page rank. Although all three models are based on probability theory, they are very different in nature. Each model seems well-suited for solving certain information retrieval problems, but not so useful for solving others. So, essentially each model solves part of a bigger puzzle, and a unified view on these models might be a first step towards an Information Retrieval Theory of Everything

CiteSeerX

Radboud Repository

University of Twente Research Information

The Structure and Dynamics of Co-Citation Clusters: A Multiple-Perspective Co-Citation Analysis

Author: Bar-Ilan
Batagelj
Ben-Hur
Bonacich
Brandes
Brin
Carmel
Chen
Chen
Chen
Chen
Chen
Chen
Chen
Cronin
Deerwester
Dunning
Fernandez
Fiszman
Freeman
Garfield
Garfield
Jaccard
Janssens
Kiss
Klavans
Kleinberg
Kumar
Lane
Leydesdorff
Meho
Mihalcea
Morris
Morris
Morris
Newman
Ng
Persson
Radev
Rousseeuw
Salton
Schmid
Schneider
Schneider
Schneider
Shi
Shibata
Small
Small
Small
Small
Small
Small
Small
Small
Sparck Jones
Tabah
Teufel
White
White
White
White
Witten
Zhao
Zhao
Zins
Zins
Zins
Zins
Zuccala
Åström
Publication venue: 'Wiley'
Publication date: 09/02/2010
Field of study

A multiple-perspective co-citation analysis method is introduced for characterizing and interpreting the structure and dynamics of co-citation clusters. The method facilitates analytic and sense making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization. Co-citation networks are decomposed into co-citation clusters. The interpretation of these clusters is augmented by automatic cluster labeling and summarization. The method focuses on the interrelations between a co-citation cluster's members and their citers. The generic method is applied to a three-part analysis of the field of Information Science as defined by 12 journals published between 1996 and 2008: 1) a comparative author co-citation analysis (ACA), 2) a progressive ACA of a time series of co-citation networks, and 3) a progressive document co-citation analysis (DCA). Results show that the multiple-perspective method increases the interpretability and accountability of both ACA and DCA networks.Comment: 33 pages, 11 figures, 10 tables. To appear in the Journal of the American Society for Information Science and Technolog

arXiv.org e-Print Archive

Crossref

HAL

HAL-Lyon 3

Measuring Author Research Relatedness: A Comparison of Word-based,Topic-based and Author Cocitation Approaches

Author: Lu Kun
Wolfram Dietmar
Publication venue: UWM Digital Commons
Publication date: 01/10/2012
Field of study

Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on Latent Dirichlet Allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map

University of Wisconsin-Milwaukee