2 research outputs found
Large scale citation matching using Apache Hadoop
During the process of citation matching links from bibliography entries to
referenced publications are created. Such links are indicators of topical
similarity between linked texts, are used in assessing the impact of the
referenced document and improve navigation in the user interfaces of digital
libraries. In this paper we present a citation matching method and show how to
scale it up to handle great amounts of data using appropriate indexing and a
MapReduce paradigm in the Hadoop environment.Comment: 11 pages, 4 figure
Taming the zoo - about algorithms implementation in the ecosystem of Apache Hadoop
Content Analysis System (CoAnSys) is a research framework for mining
scientific publications using Apache Hadoop. This article describes the
algorithms currently implemented in CoAnSys including classification,
categorization and citation matching of scientific publications. The size of
the input data classifies these algorithms in the range of big data problems,
which can be efficiently solved on Hadoop clusters.Comment: This paper (with changed content) appeared under the title "Content
Analysis of Scientific Articles in Apache Hadoop Ecosystem" in "Intelligent
Tools for Building a Scientific Information Platform: From Research to
Implementation", "Studies in Computational Intelligence", Volume 541, 2014,
http://link.springer.com/book/10.1007/978-3-319-04714-