Search CORE

158 research outputs found

Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

Author: Alienin Oleg
Gordienko Yuri
Rojbi A.
Stirenko Sergii
Taran Vladyslav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/07/2017
Field of study

Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.Comment: 5 pages, 1 table, 2017 IEEE International Young Scientists Forum on Applied Physics and Engineering (YSF-2017) (Lviv, Ukraine

arXiv.org e-Print Archive

Crossref

BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences

Author: Barker Adam
Saleem Muhammed Asif
Varghese Blesson
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/11/2014
Field of study

This paper argues that there are three fundamental challenges that need to be overcome in order to foster the adoption of big data technologies in non-computer science related disciplines: addressing issues of accessibility of such technologies for non-computer scientists, supporting the ad hoc exploration of large data sets with minimal effort and the availability of lightweight web-based frameworks for quick and easy analytics. In this paper, we address the above three challenges through the development of 'BigExcel', a three tier web-based framework for exploring big data to facilitate the management of user interactions with large data sets, the construction of queries to explore the data set and the management of the infrastructure. The feasibility of BigExcel is demonstrated through two Yahoo Sandbox datasets. The first dataset is the Yahoo Buzz Score data set we use for quantitatively predicting trending technologies and the second is the Yahoo n-gram corpus we use for qualitatively inferring the coverage of important events. A demonstration of the BigExcel framework and source code is available at http://bigdata.cs.st-andrews.ac.uk/projects/bigexcel-exploring-big-data-for-social-sciences/.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Meter data management for smart monitoring power networks

Author: Garbajosa Sopeña Juan
López-Perea Santos Mercedes
Yagüe Panadero Agustín
Publication venue: E.U. de Informática (UPM)
Publication date: 01/01/2013
Field of study

The electrical power distribution and commercialization scenario is evolving worldwide, and electricity companies, faced with the challenge of new information requirements, are demanding IT solutions to deal with the smart monitoring of power networks. Two main challenges arise from data management and smart monitoring of power networks: real-time data acquisition and big data processing over short time periods. We present a solution in the form of a system architecture that conveys real time issues and has the capacity for big data management

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Intelligent distributed processing methods for big data

Author: Badica Costin
Camacho David
Jung Jason J.
Publication venue: Graz University of Technology, Institute for Information Systems and Computer Media
Publication date: 01/01/2015
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

mARC: Memory by Association and Reinforcement of Contexts

Author: Descourt Patrice
Rimoux Norbert
Publication venue
Publication date: 10/12/2013
Field of study

This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

arXiv.org e-Print Archive

CiteSeerX

Review of performance of various Big Databases

Author: Mallika Wadhwa, Er. Amrit Kaur
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2017
Field of study

Relational databases have been the main model for information data storage, retrieval and administration.A relational database is a table-based data system where there is no scalability, insignificant information duplication, computationally costly table joins and trouble in managing complex information. The greatest inspiration of NoSQL is adaptability. NoSQL information stores are broadly used to store and recover potentially a lot of information.In this paper, we assess four most famous NoSQL databases: Cassandra, MongoDB, and CouchDB

International Journal on Recent and Innovation Trends in Computing and Communication

Working with Newer Data Management Technologies

Author: Isitor Emmanuel
STANIER Clare
Publication venue
Publication date: 19/06/2014
Field of study

Data management technologies are changing rapidly and this presents a significant challenge for database teaching. There is a requirement to teach traditional relational database concepts and to ensure that students are equipped with the advanced skills expected by employers. There is also a requirement to prepare students to work with newer data models and NoSQL and to understand and be able to leverage concepts such as Big Data analytics. This paper discusses the experience of working with MongoDB and MapReduce and starting to work with Hadoop in undergraduate and postgraduate teaching at Staffordshire University. It is suggested that while the amount of time that can be given to newer technologies in the undergraduate curriculum is limited, this is a subject area which has the power to capture students’ imaginations and provides a good basis for undergraduate projects and Masters level dissertations

STORE - Staffordshire Online Repository