Search CORE

380 research outputs found

Recommended from our members

Can we do better than co-citations? Bringing Citation Proximity Analysis from idea to practice in research articles recommendation

Author: Khadka Anita
Knoth Petr
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we build on the idea of Citation Proximity Analysis (CPA), originally introduced in [1], by developing a step by step scalable approach for building CPA-based recommender systems. As part of this approach, we introduce three new proximity functions, extending the basic assumption of co-citation analysis (stating that the more often two articles are co-cited in a document, the more likely they are related) to take the distance between the co-cited documents into account. Ask- ing the question of whether CPA can outperform co-citation analysis in recommender systems, we have built a CPA based recommender system from a corpus of 368,385 full-texts articles and conducted a user survey to perform an initial evaluation. Two of our three proximity functions used within CPA outperform co-citations on our evaluation dataset

Open Research Online (The Open University)

BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences

Author: Barker Adam
Saleem Muhammed Asif
Varghese Blesson
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/11/2014
Field of study

This paper argues that there are three fundamental challenges that need to be overcome in order to foster the adoption of big data technologies in non-computer science related disciplines: addressing issues of accessibility of such technologies for non-computer scientists, supporting the ad hoc exploration of large data sets with minimal effort and the availability of lightweight web-based frameworks for quick and easy analytics. In this paper, we address the above three challenges through the development of 'BigExcel', a three tier web-based framework for exploring big data to facilitate the management of user interactions with large data sets, the construction of queries to explore the data set and the management of the infrastructure. The feasibility of BigExcel is demonstrated through two Yahoo Sandbox datasets. The first dataset is the Yahoo Buzz Score data set we use for quantitatively predicting trending technologies and the second is the Yahoo n-gram corpus we use for qualitatively inferring the coverage of important events. A demonstration of the BigExcel framework and source code is available at http://bigdata.cs.st-andrews.ac.uk/projects/bigexcel-exploring-big-data-for-social-sciences/.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Handling Large-Scale Document Collections using Information Retrieval in the Age of Big Data

Author: Manish Rana et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 05/11/2023
Field of study

This paper's primary goal is to present an overview of big data and its analysis utilizing various methodologies, particularly evolutionary computing techniques, which improve information retrieval over standard search methods. is obtainable. Big data is defined as a huge, diverse collection of data that is difficult to handle with conventional computational methods and necessitates the use of more sophisticated statistical approaches in order to extract pertinent information from the data. Along with providing an overview of evolutionary computational approaches, this study also discusses some of the main models used for information retrieval

International Journal on Recent and Innovation Trends in Computing and Communication

ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

Author: AlSum Ahmed
Brügger Niels
Gomes Daniel
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/02/2017
Field of study

Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US

arXiv.org e-Print Archive

Crossref

A New Framework for Securing, Extracting and Analyzing Big Forensic Data

Author: Chen Lei
Rebman Carl
Sachdev Hitesh
Wimmer Hayden
Publication venue: Digital Commons@Georgia Southern
Publication date: 01/10/2018
Field of study

Finding new methods to investigate criminal activities, behaviors, and responsibilities has always been a challenge for forensic research. Advances in big data, technology, and increased capabilities of smartphones has contributed to the demand for modern techniques of examination. Smartphones are ubiquitous, transformative, and have become a goldmine for forensics research. Given the right tools and research methods investigating agencies can help crack almost any illegal activity using smartphones. This paper focuses on conducting forensic analysis in exposing a terrorist or criminal network and introduces a new Big Forensic Data Framework model where different technologies of Hadoop and EnCase software are combined in an effort to promote more effective and efficient processing of the massive Big Forensic Data. The research propositions this model postulates could lead the investigating agencies to the head of the terrorist networks. Results indicate the Big Forensic Data Framework model is capable of processing Big Forensic Data

Georgia Southern University: Digital Commons@Georgia Southern