380 research outputs found
Recommended from our members
Can we do better than co-citations? Bringing Citation Proximity Analysis from idea to practice in research articles recommendation
In this paper, we build on the idea of Citation Proximity Analysis (CPA), originally introduced in [1], by developing a step by step scalable approach for building CPA-based recommender systems. As part of this approach, we introduce three new proximity functions, extending the basic assumption of co-citation analysis (stating that the more often two articles are co-cited in a document, the more likely they are related) to take the distance between the co-cited documents into account. Ask- ing the question of whether CPA can outperform co-citation analysis in recommender systems, we have built a CPA based recommender system from a corpus of 368,385 full-texts articles and conducted a user survey to perform an initial evaluation. Two of our three proximity functions used within CPA outperform co-citations on our evaluation dataset
BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences
This paper argues that there are three fundamental challenges that need to be
overcome in order to foster the adoption of big data technologies in
non-computer science related disciplines: addressing issues of accessibility of
such technologies for non-computer scientists, supporting the ad hoc
exploration of large data sets with minimal effort and the availability of
lightweight web-based frameworks for quick and easy analytics. In this paper,
we address the above three challenges through the development of 'BigExcel', a
three tier web-based framework for exploring big data to facilitate the
management of user interactions with large data sets, the construction of
queries to explore the data set and the management of the infrastructure. The
feasibility of BigExcel is demonstrated through two Yahoo Sandbox datasets. The
first dataset is the Yahoo Buzz Score data set we use for quantitatively
predicting trending technologies and the second is the Yahoo n-gram corpus we
use for qualitatively inferring the coverage of important events. A
demonstration of the BigExcel framework and source code is available at
http://bigdata.cs.st-andrews.ac.uk/projects/bigexcel-exploring-big-data-for-social-sciences/.Comment: 8 page
Handling Large-Scale Document Collections using Information Retrieval in the Age of Big Data
This paper's primary goal is to present an overview of big data and its analysis utilizing various methodologies, particularly evolutionary computing techniques, which improve information retrieval over standard search methods. is obtainable. Big data is defined as a huge, diverse collection of data that is difficult to handle with conventional computational methods and necessitates the use of more sophisticated statistical approaches in order to extract pertinent information from the data. Along with providing an overview of evolutionary computational approaches, this study also discusses some of the main models used for information retrieval
ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation
Web archives are a valuable resource for researchers of various disciplines.
However, to use them as a scholarly source, researchers require a tool that
provides efficient access to Web archive data for extraction and derivation of
smaller datasets. Besides efficient access we identify five other objectives
based on practical researcher needs such as ease of use, extensibility and
reusability.
Towards these objectives we propose ArchiveSpark, a framework for efficient,
distributed Web archive processing that builds a research corpus by working on
existing and standardized data formats commonly held by Web archiving
institutions. Performance optimizations in ArchiveSpark, facilitated by the use
of a widely available metadata index, result in significant speed-ups of data
processing. Our benchmarks show that ArchiveSpark is faster than alternative
approaches without depending on any additional data stores while improving
usability by seamlessly integrating queries and derivations with external
tools.Comment: JCDL 2016, Newark, NJ, US
A New Framework for Securing, Extracting and Analyzing Big Forensic Data
Finding new methods to investigate criminal activities, behaviors, and responsibilities has always been a challenge for forensic research. Advances in big data, technology, and increased capabilities of smartphones has contributed to the demand for modern techniques of examination. Smartphones are ubiquitous, transformative, and have become a goldmine for forensics research. Given the right tools and research methods investigating agencies can help crack almost any illegal activity using smartphones. This paper focuses on conducting forensic analysis in exposing a terrorist or criminal network and introduces a new Big Forensic Data Framework model where different technologies of Hadoop and EnCase software are combined in an effort to promote more effective and efficient processing of the massive Big Forensic Data. The research propositions this model postulates could lead the investigating agencies to the head of the terrorist networks. Results indicate the Big Forensic Data Framework model is capable of processing Big Forensic Data
- …