20,667 research outputs found
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
High Performance Composition Operators in Component Models
International audienceScientific numerical applications are always expecting more computing and storage capabilities to compute at finer grain and/or to integrate more phenomena in their computations. Even though, they are getting more complex to develop. However, the continual growth of computing and storage capabilities is achieved with an increase complexity of infrastructures. Thus, there is an important challenge to define programming abstractions able to deal with software and hardware complexity. An interesting approach is represented by software component models. This chapter first analyzes how high performance interactions are only partially supported by specialized component models. Then, it introduces HLCM, a component model that aims at efficiently supporting all kinds of static compositions
Worker-robot cooperation and integration into the manufacturing workcell via the holonic control architecture
Cooperative manufacturing is a new field of research, which addresses new challenges beyond the physical safety of the worker. Those new challenges appear due to the need to connect the worker and the cobot from the informatics point of view in one cooperative workcell. This requires developing an appropriate manufacturing control system, which fits the nature of both the worker and the cobot. Furthermore, the manufacturing control system must be able to understand the production variations, to guide the cooperation between worker and the cobot and adapt with the production variations.Die kooperative Fertigung ist ein neues Forschungsgebiet, das sich neuen Herausforderungen stellt. Diese neuen Herausforderungen ergeben sich aus der Notwendigkeit, den Arbeiter und den Cobot aus der Sicht der Informatik in einem kooperativen Arbeitsplatz zu verbinden. Dies erfordert die Entwicklung eines geeigneten Produktionskontrollsystems, das sowohl der Natur des Arbeiters als auch der des Cobots entspricht. DarĂĽber hinaus muss die Fertigungssteuerung in der Lage sein, die Produktionsschwankungen zu verstehen, um die Zusammenarbeit zwischen Arbeiter und Cobot zu steuern
Middleware Technologies for Cloud of Things - a survey
The next wave of communication and applications rely on the new services
provided by Internet of Things which is becoming an important aspect in human
and machines future. The IoT services are a key solution for providing smart
environments in homes, buildings and cities. In the era of a massive number of
connected things and objects with a high grow rate, several challenges have
been raised such as management, aggregation and storage for big produced data.
In order to tackle some of these issues, cloud computing emerged to IoT as
Cloud of Things (CoT) which provides virtually unlimited cloud services to
enhance the large scale IoT platforms. There are several factors to be
considered in design and implementation of a CoT platform. One of the most
important and challenging problems is the heterogeneity of different objects.
This problem can be addressed by deploying suitable "Middleware". Middleware
sits between things and applications that make a reliable platform for
communication among things with different interfaces, operating systems, and
architectures. The main aim of this paper is to study the middleware
technologies for CoT. Toward this end, we first present the main features and
characteristics of middlewares. Next we study different architecture styles and
service domains. Then we presents several middlewares that are suitable for CoT
based platforms and lastly a list of current challenges and issues in design of
CoT based middlewares is discussed.Comment: http://www.sciencedirect.com/science/article/pii/S2352864817301268,
Digital Communications and Networks, Elsevier (2017
Middleware Technologies for Cloud of Things - a survey
The next wave of communication and applications rely on the new services
provided by Internet of Things which is becoming an important aspect in human
and machines future. The IoT services are a key solution for providing smart
environments in homes, buildings and cities. In the era of a massive number of
connected things and objects with a high grow rate, several challenges have
been raised such as management, aggregation and storage for big produced data.
In order to tackle some of these issues, cloud computing emerged to IoT as
Cloud of Things (CoT) which provides virtually unlimited cloud services to
enhance the large scale IoT platforms. There are several factors to be
considered in design and implementation of a CoT platform. One of the most
important and challenging problems is the heterogeneity of different objects.
This problem can be addressed by deploying suitable "Middleware". Middleware
sits between things and applications that make a reliable platform for
communication among things with different interfaces, operating systems, and
architectures. The main aim of this paper is to study the middleware
technologies for CoT. Toward this end, we first present the main features and
characteristics of middlewares. Next we study different architecture styles and
service domains. Then we presents several middlewares that are suitable for CoT
based platforms and lastly a list of current challenges and issues in design of
CoT based middlewares is discussed.Comment: http://www.sciencedirect.com/science/article/pii/S2352864817301268,
Digital Communications and Networks, Elsevier (2017
Spark deployment and performance evaluation on the MareNostrum supercomputer
In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalability of the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and optimized methodologies for fine-tuning data-intensive application on large clusters emphasizing on parallelism configurations.Peer ReviewedPostprint (author's final draft
- …