380 research outputs found

    BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences

    Get PDF
    This paper argues that there are three fundamental challenges that need to be overcome in order to foster the adoption of big data technologies in non-computer science related disciplines: addressing issues of accessibility of such technologies for non-computer scientists, supporting the ad hoc exploration of large data sets with minimal effort and the availability of lightweight web-based frameworks for quick and easy analytics. In this paper, we address the above three challenges through the development of 'BigExcel', a three tier web-based framework for exploring big data to facilitate the management of user interactions with large data sets, the construction of queries to explore the data set and the management of the infrastructure. The feasibility of BigExcel is demonstrated through two Yahoo Sandbox datasets. The first dataset is the Yahoo Buzz Score data set we use for quantitatively predicting trending technologies and the second is the Yahoo n-gram corpus we use for qualitatively inferring the coverage of important events. A demonstration of the BigExcel framework and source code is available at http://bigdata.cs.st-andrews.ac.uk/projects/bigexcel-exploring-big-data-for-social-sciences/.Comment: 8 page

    Handling Large-Scale Document Collections using Information Retrieval in the Age of Big Data

    Get PDF
    This paper's primary goal is to present an overview of big data and its analysis utilizing various methodologies, particularly evolutionary computing techniques, which improve information retrieval over standard search methods. is obtainable. Big data is defined as a huge, diverse collection of data that is difficult to handle with conventional computational methods and necessitates the use of more sophisticated statistical approaches in order to extract pertinent information from the data. Along with providing an overview of evolutionary computational approaches, this study also discusses some of the main models used for information retrieval

    ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

    Full text link
    Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US

    A New Framework for Securing, Extracting and Analyzing Big Forensic Data

    Get PDF
    Finding new methods to investigate criminal activities, behaviors, and responsibilities has always been a challenge for forensic research. Advances in big data, technology, and increased capabilities of smartphones has contributed to the demand for modern techniques of examination. Smartphones are ubiquitous, transformative, and have become a goldmine for forensics research. Given the right tools and research methods investigating agencies can help crack almost any illegal activity using smartphones. This paper focuses on conducting forensic analysis in exposing a terrorist or criminal network and introduces a new Big Forensic Data Framework model where different technologies of Hadoop and EnCase software are combined in an effort to promote more effective and efficient processing of the massive Big Forensic Data. The research propositions this model postulates could lead the investigating agencies to the head of the terrorist networks. Results indicate the Big Forensic Data Framework model is capable of processing Big Forensic Data
    • …
    corecore