158 research outputs found
Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks
Recently, due to rapid development of information and communication
technologies, the data are created and consumed in the avalanche way.
Distributed computing create preconditions for analyzing and processing such
Big Data by distributing the computations among a number of compute nodes. In
this work, performance of distributed computing environments on the basis of
Hadoop and Spark frameworks is estimated for real and virtual versions of
clusters. As a test task, we chose the classic use case of word counting in
texts of various sizes. It was found that the running times grow very fast with
the dataset size and faster than a power function even. As to the real and
virtual versions of cluster implementations, this tendency is the similar for
both Hadoop and Spark frameworks. Moreover, speedup values decrease
significantly with the growth of dataset size, especially for virtual version
of cluster configuration. The problem of growing data generated by IoT and
multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye
tracking, etc.) interaction channels is presented. In the context of this
problem, the current observations as to the running times and speedup on Hadoop
and Spark frameworks in real and virtual cluster configurations can be very
useful for the proper scaling-up and efficient job management, especially for
machine learning and Deep Learning applications, where Big Data are widely
present.Comment: 5 pages, 1 table, 2017 IEEE International Young Scientists Forum on
Applied Physics and Engineering (YSF-2017) (Lviv, Ukraine
BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences
This paper argues that there are three fundamental challenges that need to be
overcome in order to foster the adoption of big data technologies in
non-computer science related disciplines: addressing issues of accessibility of
such technologies for non-computer scientists, supporting the ad hoc
exploration of large data sets with minimal effort and the availability of
lightweight web-based frameworks for quick and easy analytics. In this paper,
we address the above three challenges through the development of 'BigExcel', a
three tier web-based framework for exploring big data to facilitate the
management of user interactions with large data sets, the construction of
queries to explore the data set and the management of the infrastructure. The
feasibility of BigExcel is demonstrated through two Yahoo Sandbox datasets. The
first dataset is the Yahoo Buzz Score data set we use for quantitatively
predicting trending technologies and the second is the Yahoo n-gram corpus we
use for qualitatively inferring the coverage of important events. A
demonstration of the BigExcel framework and source code is available at
http://bigdata.cs.st-andrews.ac.uk/projects/bigexcel-exploring-big-data-for-social-sciences/.Comment: 8 page
Meter data management for smart monitoring power networks
The electrical power distribution and commercialization scenario is evolving worldwide, and electricity companies, faced with the challenge of new information requirements, are demanding IT solutions to deal with the smart monitoring of power networks. Two main challenges arise from data management and smart monitoring of power networks: real-time data acquisition and big data processing over short time periods. We present a solution in the form of a system architecture that conveys real time issues and has the capacity for big data management
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
Review of performance of various Big Databases
Relational databases have been the main model for information data storage, retrieval and administration.A relational database is a table-based data system where there is no scalability, insignificant information duplication, computationally costly table joins and trouble in managing complex information. The greatest inspiration of NoSQL is adaptability. NoSQL information stores are broadly used to store and recover potentially a lot of information.In this paper, we assess four most famous NoSQL databases: Cassandra, MongoDB, and CouchDB
Working with Newer Data Management Technologies
Data management technologies are changing rapidly and this presents a significant challenge for database teaching. There is a requirement to teach traditional relational database concepts and to ensure that students are equipped with the advanced skills expected by employers. There is also a requirement to prepare students to work with newer data models and NoSQL and to understand and be able to leverage concepts such as Big Data analytics. This paper discusses the experience of working with MongoDB and MapReduce and starting to work with Hadoop in undergraduate and postgraduate teaching at Staffordshire University. It is suggested that while the amount of time that
can be given to newer technologies in the undergraduate curriculum is limited, this is a subject area which has the power to capture students’ imaginations and provides a good basis for undergraduate projects and Masters level dissertations
- …