812 research outputs found
Resilient application co-scheduling with processor redistribution
Recently, the benefits of co-scheduling several applications have been demonstrated in a fault-free context, both in terms of performance and energy savings. However, large-scale computer systems are confronted to frequent failures, and resilience techniques must be employed to ensure the completion of large applications. Indeed, failures may create severe imbalance between applications, and significantly degrade performance. In this paper, we propose to redistribute the resources assigned to each application upon the striking of failures, in order to minimize the expected completion time of a set of co-scheduled applications. First we introduce a formal model and establish complexity results. When no redistribution is allowed, we can minimize the expected completion time in polynomial time, while the problem becomes NP-complete with redistributions, even in a fault-free context. Therefore, we design polynomial-time heuristics that perform redistributions and account for processor failures. A fault simulator is used to perform extensive simulations that demonstrate the usefulness of redistribution and the performance of the proposed heuristics
Atomic-SDN: Is Synchronous Flooding the Solution to Software-Defined Networking in IoT?
The adoption of Software Defined Networking (SDN) within traditional networks
has provided operators the ability to manage diverse resources and easily
reconfigure networks as requirements change. Recent research has extended this
concept to IEEE 802.15.4 low-power wireless networks, which form a key
component of the Internet of Things (IoT). However, the multiple traffic
patterns necessary for SDN control makes it difficult to apply this approach to
these highly challenging environments. This paper presents Atomic-SDN, a highly
reliable and low-latency solution for SDN in low-power wireless. Atomic-SDN
introduces a novel Synchronous Flooding (SF) architecture capable of
dynamically configuring SF protocols to satisfy complex SDN control
requirements, and draws from the authors' previous experiences in the IEEE EWSN
Dependability Competition: where SF solutions have consistently outperformed
other entries. Using this approach, Atomic-SDN presents considerable
performance gains over other SDN implementations for low-power IoT networks. We
evaluate Atomic-SDN through simulation and experimentation, and show how
utilizing SF techniques provides latency and reliability guarantees to SDN
control operations as the local mesh scales. We compare Atomic-SDN against
other SDN implementations based on the IEEE 802.15.4 network stack, and
establish that Atomic-SDN improves SDN control by orders-of-magnitude across
latency, reliability, and energy-efficiency metrics
Performance Characterization of Spark Workloads on Shared NUMA Systems
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, there is also a growing need to optimize them for modern processors. Spark has gained momentum over the last few years among companies looking for high performance solutions that can scale out across different cluster sizes. At the same time, modern processors can be connected to large amounts of physical memory, in the range of up to few terabytes. This opens an enormous range of opportunities for runtimes and applications that aim to improve their performance by leveraging low latencies and high bandwidth provided by RAM. The result is that there are several examples today of applications that have started pushing the in-memory computing paradigm to accelerate tasks. To deliver such a large physical memory capacity, hardware vendors have leveraged Non-Uniform Memory Architectures (NUMA). This paper explores how Spark-based workloads are impacted by the effects of NUMA-placement decisions, how different Spark configurations result in changes in delivered performance, how the characteristics of the applications can be used to predict workload collocation conflicts, and how to improve performance by collocating workloads in scale-up nodes. We explore several workloads run on top of the IBM Power8 processor, and provide manual strategies that can leverage performance improvements up to 40% on Spark workloads when using smart processor-pinning and workload collocation strategies.This work is partially supported by the European Research Council (ERC) under the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and
Competitiveness (TIN2015-65316-P) and the Generalitat de Catalunya (2014-SGR-1051).Postprint (author's final draft
Architecting Time-Critical Big-Data Systems
Current infrastructures for developing big-data applications are able to process –via big-data analytics- huge amounts of data, using clusters of machines that collaborate to perform parallel computations. However, current infrastructures were not designed to work with the requirements of time-critical applications; they are more focused on general-purpose applications rather than time-critical ones. Addressing this issue from the perspective of the real-time systems community, this paper considers time-critical big-data. It deals with the definition of a time-critical big-data system from the point of view of requirements, analyzing the specific characteristics of some popular big-data applications. This analysis is complemented by the challenges stemmed from the infrastructures that support the applications, proposing an architecture and offering initial performance patterns that connect application costs with infrastructure performance
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
There is a growing need for distributed graph processing systems that are
capable of gracefully scaling to very large graph datasets. Unfortunately, this
challenge has not been easily met due to the intense memory pressure imposed by
process-centric, message passing designs that many graph processing systems
follow. Pregelix is a new open source distributed graph processing system that
is based on an iterative dataflow design that is better tuned to handle both
in-memory and out-of-core workloads. As such, Pregelix offers improved
performance characteristics and scaling properties over current open source
systems (e.g., we have seen up to 15x speedup compared to Apache Giraph and up
to 35x speedup compared to distributed GraphLab), and makes more effective use
of available machine resources to support Big(ger) Graph Analytics
Spark deployment and performance evaluation on the MareNostrum supercomputer
In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalability of the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and optimized methodologies for fine-tuning data-intensive application on large clusters emphasizing on parallelism configurations.Peer ReviewedPostprint (author's final draft
- …