2,406 research outputs found
Real-Time Data Processing With Lambda Architecture
Data has evolved immensely in recent years, in type, volume and velocity. There are several frameworks to handle the big data applications. The project focuses on the Lambda Architecture proposed by Marz and its application to obtain real-time data processing. The architecture is a solution that unites the benefits of the batch and stream processing techniques. Data can be historically processed with high precision and involved algorithms without loss of short-term information, alerts and insights. Lambda Architecture has an ability to serve a wide range of use cases and workloads that withstands hardware and human mistakes. The layered architecture enhances loose coupling and flexibility in the system. This a huge benefit that allows understanding the trade-offs and application of various tools and technologies across the layers. There has been an advancement in the approach of building the LA due to improvements in the underlying tools. The project demonstrates a simplified architecture for the LA that is maintainable
Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks
Recently, due to rapid development of information and communication
technologies, the data are created and consumed in the avalanche way.
Distributed computing create preconditions for analyzing and processing such
Big Data by distributing the computations among a number of compute nodes. In
this work, performance of distributed computing environments on the basis of
Hadoop and Spark frameworks is estimated for real and virtual versions of
clusters. As a test task, we chose the classic use case of word counting in
texts of various sizes. It was found that the running times grow very fast with
the dataset size and faster than a power function even. As to the real and
virtual versions of cluster implementations, this tendency is the similar for
both Hadoop and Spark frameworks. Moreover, speedup values decrease
significantly with the growth of dataset size, especially for virtual version
of cluster configuration. The problem of growing data generated by IoT and
multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye
tracking, etc.) interaction channels is presented. In the context of this
problem, the current observations as to the running times and speedup on Hadoop
and Spark frameworks in real and virtual cluster configurations can be very
useful for the proper scaling-up and efficient job management, especially for
machine learning and Deep Learning applications, where Big Data are widely
present.Comment: 5 pages, 1 table, 2017 IEEE International Young Scientists Forum on
Applied Physics and Engineering (YSF-2017) (Lviv, Ukraine
- …