310,739 research outputs found
Dynamic re-optimization techniques for stream processing engines and object stores
Large scale data storage and processing systems are strongly motivated by the need to store and analyze massive datasets. The complexity of a large class of these systems is rooted in their distributed nature, extreme scale, need for real-time response, and streaming nature. The use of these systems on multi-tenant, cloud environments with potential resource interference necessitates fine-grained monitoring and control. In this dissertation, we present efficient, dynamic techniques for re-optimizing stream-processing systems and transactional object-storage systems.^ In the context of stream-processing systems, we present VAYU, a per-topology controller. VAYU uses novel methods and protocols for dynamic, network-aware tuple-routing in the dataflow. We show that the feedback-driven controller in VAYU helps achieve high pipeline throughput over long execution periods, as it dynamically detects and diagnoses any pipeline-bottlenecks. We present novel heuristics to optimize overlays for group communication operations in the streaming model.^ In the context of object-storage systems, we present M-Lock, a novel lock-localization service for distributed transaction protocols on scale-out object stores to increase transaction throughput. Lock localization refers to dynamic migration and partitioning of locks across nodes in the scale-out store to reduce cross-partition acquisition of locks. The service leverages the observed object-access patterns to achieve lock-clustering and deliver high performance. We also present TransMR, a framework that uses distributed, transactional object stores to orchestrate and execute asynchronous components in amorphous data-parallel applications on scale-out architectures
Next-Generation Battery Management Systems: Dynamic Reconfiguration
Batteries are widely applied to the energy storage and power supply in portable electronics, transportation, power systems, communication networks, etc. They are particularly demanded in the emerging technologies of vehicle electrification and renewable energy integration for a green and sustainable society. To meet various voltage, power, and energy requirements in large-scale applications, multiple battery cells have to be connected in series and/or parallel. While battery technology has advanced significantly in the past decade, existing battery management systems (BMSs) mainly focus on state monitoring and control of battery systems packed in fixed configurations. In fixed configurations, though, the battery system performance is in principle limited by the weakest cells, which can leave large parts severely underutilized. Allowing dynamic reconfiguration of battery cells, on the other hand, allows individual and flexible manipulation of the battery system at cell, module, and pack levels, which may open up a new paradigm for battery management. Following this trend, this paper provides an overview of next-generation BMSs featuring dynamic reconfiguration. Motivated by numerous potential benefits of reconfigurable battery systems (RBSs), the hardware designs, management principles, and optimization algorithms for RBSs are sequentially and systematically discussed. Theoretical and practical challenges during the design and implementation of RBSs are highlighted in the end to stimulate future research and development
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
C2MS: Dynamic Monitoring and Management of Cloud Infrastructures
Server clustering is a common design principle employed by many organisations
who require high availability, scalability and easier management of their
infrastructure. Servers are typically clustered according to the service they
provide whether it be the application(s) installed, the role of the server or
server accessibility for example. In order to optimize performance, manage load
and maintain availability, servers may migrate from one cluster group to
another making it difficult for server monitoring tools to continuously monitor
these dynamically changing groups. Server monitoring tools are usually
statically configured and with any change of group membership requires manual
reconfiguration; an unreasonable task to undertake on large-scale cloud
infrastructures.
In this paper we present the Cloudlet Control and Management System (C2MS); a
system for monitoring and controlling dynamic groups of physical or virtual
servers within cloud infrastructures. The C2MS extends Ganglia - an open source
scalable system performance monitoring tool - by allowing system administrators
to define, monitor and modify server groups without the need for server
reconfiguration. In turn administrators can easily monitor group and individual
server metrics on large-scale dynamic cloud infrastructures where roles of
servers may change frequently. Furthermore, we complement group monitoring with
a control element allowing administrator-specified actions to be performed over
servers within service groups as well as introduce further customized
monitoring metrics. This paper outlines the design, implementation and
evaluation of the C2MS.Comment: Proceedings of the The 5th IEEE International Conference on Cloud
Computing Technology and Science (CloudCom 2013), 8 page
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the first six months. The project aim is to scale the Erlang’s radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the effectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
The Design and Demonstration of the Ultralight Testbed
In this paper we present the motivation, the design, and a recent demonstration of the UltraLight testbed at SC|05. The goal of the Ultralight testbed is to help meet the data-intensive computing challenges of the next generation of particle physics experiments with a comprehensive, network- focused approach. UltraLight adopts a new approach to networking: instead of treating it traditionally, as a static, unchanging and unmanaged set of inter-computer links, we are developing and using it as a dynamic, configurable, and closely monitored resource that is managed from end-to-end. To achieve its goal we are constructing a next-generation global system that is able to meet the data processing, distribution, access and analysis needs of the particle physics community. In this paper we will first present early results in the various working areas of the project. We then describe our experiences of the network architecture, kernel setup, application tuning and configuration used during the bandwidth challenge event at SC|05. During this Challenge, we achieved a record-breaking aggregate data rate in excess of 150 Gbps while moving physics datasets between many Grid computing sites
The Motivation, Architecture and Demonstration of Ultralight Network Testbed
In this paper we describe progress in the NSF-funded Ultralight project and a recent demonstration of Ultralight technologies at SuperComputing 2005 (SC|05). The goal of the
Ultralight project is to help meet the data-intensive computing challenges of the next generation of particle physics experiments with a comprehensive, network-focused approach. Ultralight adopts a new approach to networking: instead of treating it traditionally, as a static, unchanging and unmanaged set of inter-computer links, we are developing and using it as a dynamic, configurable, and closely monitored resource that is managed from end-to-end. Thus we are constructing a next-generation global system that is able to meet the data processing, distribution, access and analysis needs of the particle physics community. In this paper we present the motivation for, and an overview of, the Ultralight project. We then cover early
results in the various working areas of the project. The remainder of the paper describes our experiences of the Ultralight network architecture, kernel setup, application tuning and configuration used during the bandwidth challenge event at SC|05. During this Challenge, we
achieved a record-breaking aggregate data rate in excess of 150 Gbps while moving physics datasets between many sites interconnected by the Ultralight backbone network. The exercise highlighted the benefits of Ultralight's research and development efforts that are enabling new and advanced methods of distributed scientific data analysis
Recommended from our members
Distributed simulation and the grid: Position statements
The Grid provides a new and unrivaled technology for large scale distributed simulation as it enables collaboration and the use of distributed computing resources. This panel paper presents the views of four researchers in the area of Distributed Simulation and the Grid. Together we try to identify the main research issues involved in applying Grid technology to distributed simulation and the key future challenges that need to be solved to achieve this goal. Such challenges include not only technical challenges, but also political ones such as management methodology for the Grid and the development of standards. The benefits of the Grid to end-user simulation modelers also are discussed
- …