20,667 research outputs found

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    High Performance Composition Operators in Component Models

    Get PDF
    International audienceScientific numerical applications are always expecting more computing and storage capabilities to compute at finer grain and/or to integrate more phenomena in their computations. Even though, they are getting more complex to develop. However, the continual growth of computing and storage capabilities is achieved with an increase complexity of infrastructures. Thus, there is an important challenge to define programming abstractions able to deal with software and hardware complexity. An interesting approach is represented by software component models. This chapter first analyzes how high performance interactions are only partially supported by specialized component models. Then, it introduces HLCM, a component model that aims at efficiently supporting all kinds of static compositions

    Worker-robot cooperation and integration into the manufacturing workcell via the holonic control architecture

    Get PDF
    Cooperative manufacturing is a new field of research, which addresses new challenges beyond the physical safety of the worker. Those new challenges appear due to the need to connect the worker and the cobot from the informatics point of view in one cooperative workcell. This requires developing an appropriate manufacturing control system, which fits the nature of both the worker and the cobot. Furthermore, the manufacturing control system must be able to understand the production variations, to guide the cooperation between worker and the cobot and adapt with the production variations.Die kooperative Fertigung ist ein neues Forschungsgebiet, das sich neuen Herausforderungen stellt. Diese neuen Herausforderungen ergeben sich aus der Notwendigkeit, den Arbeiter und den Cobot aus der Sicht der Informatik in einem kooperativen Arbeitsplatz zu verbinden. Dies erfordert die Entwicklung eines geeigneten Produktionskontrollsystems, das sowohl der Natur des Arbeiters als auch der des Cobots entspricht. DarĂĽber hinaus muss die Fertigungssteuerung in der Lage sein, die Produktionsschwankungen zu verstehen, um die Zusammenarbeit zwischen Arbeiter und Cobot zu steuern

    Middleware Technologies for Cloud of Things - a survey

    Get PDF
    The next wave of communication and applications rely on the new services provided by Internet of Things which is becoming an important aspect in human and machines future. The IoT services are a key solution for providing smart environments in homes, buildings and cities. In the era of a massive number of connected things and objects with a high grow rate, several challenges have been raised such as management, aggregation and storage for big produced data. In order to tackle some of these issues, cloud computing emerged to IoT as Cloud of Things (CoT) which provides virtually unlimited cloud services to enhance the large scale IoT platforms. There are several factors to be considered in design and implementation of a CoT platform. One of the most important and challenging problems is the heterogeneity of different objects. This problem can be addressed by deploying suitable "Middleware". Middleware sits between things and applications that make a reliable platform for communication among things with different interfaces, operating systems, and architectures. The main aim of this paper is to study the middleware technologies for CoT. Toward this end, we first present the main features and characteristics of middlewares. Next we study different architecture styles and service domains. Then we presents several middlewares that are suitable for CoT based platforms and lastly a list of current challenges and issues in design of CoT based middlewares is discussed.Comment: http://www.sciencedirect.com/science/article/pii/S2352864817301268, Digital Communications and Networks, Elsevier (2017

    Middleware Technologies for Cloud of Things - a survey

    Full text link
    The next wave of communication and applications rely on the new services provided by Internet of Things which is becoming an important aspect in human and machines future. The IoT services are a key solution for providing smart environments in homes, buildings and cities. In the era of a massive number of connected things and objects with a high grow rate, several challenges have been raised such as management, aggregation and storage for big produced data. In order to tackle some of these issues, cloud computing emerged to IoT as Cloud of Things (CoT) which provides virtually unlimited cloud services to enhance the large scale IoT platforms. There are several factors to be considered in design and implementation of a CoT platform. One of the most important and challenging problems is the heterogeneity of different objects. This problem can be addressed by deploying suitable "Middleware". Middleware sits between things and applications that make a reliable platform for communication among things with different interfaces, operating systems, and architectures. The main aim of this paper is to study the middleware technologies for CoT. Toward this end, we first present the main features and characteristics of middlewares. Next we study different architecture styles and service domains. Then we presents several middlewares that are suitable for CoT based platforms and lastly a list of current challenges and issues in design of CoT based middlewares is discussed.Comment: http://www.sciencedirect.com/science/article/pii/S2352864817301268, Digital Communications and Networks, Elsevier (2017

    Spark deployment and performance evaluation on the MareNostrum supercomputer

    Get PDF
    In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalability of the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and optimized methodologies for fine-tuning data-intensive application on large clusters emphasizing on parallelism configurations.Peer ReviewedPostprint (author's final draft
    • …
    corecore