7,219 research outputs found
The "MIND" Scalable PIM Architecture
MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a
Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on
each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND
architecture
Tolerating Correlated Failures in Massively Parallel Stream Processing Engines
Fault-tolerance techniques for stream processing engines can be categorized
into passive and active approaches. A typical passive approach periodically
checkpoints a processing task's runtime states and can recover a failed task by
restoring its runtime state using its latest checkpoint. On the other hand, an
active approach usually employs backup nodes to run replicated tasks. Upon
failure, the active replica can take over the processing of the failed task
with minimal latency. However, both approaches have their own inadequacies in
Massively Parallel Stream Processing Engines (MPSPE). The passive approach
incurs a long recovery latency especially when a number of correlated nodes
fail simultaneously, while the active approach requires extra replication
resources. In this paper, we propose a new fault-tolerance framework, which is
Passive and Partially Active (PPA). In a PPA scheme, the passive approach is
applied to all tasks while only a selected set of tasks will be actively
replicated. The number of actively replicated tasks depends on the available
resources. If tasks without active replicas fail, tentative outputs will be
generated before the completion of the recovery process. We also propose
effective and efficient algorithms to optimize a partially active replication
plan to maximize the quality of tentative outputs. We implemented PPA on top of
Storm, an open-source MPSPE and conducted extensive experiments using both real
and synthetic datasets to verify the effectiveness of our approach
Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine
Load balancing, operator instance collocations and horizontal scaling are
critical issues in Parallel Stream Processing Engines to achieve low data
processing latency, optimized cluster utilization and minimized communication
cost respectively. In previous work, these issues are typically tackled
separately and independently. We argue that these problems are tightly coupled
in the sense that they all need to determine the allocations of workloads and
migrate computational states at runtime. Optimizing them independently would
result in suboptimal solutions. Therefore, in this paper, we investigate how
these three issues can be modeled as one integrated optimization problem. In
particular, we first consider jobs where workload allocations have little
effect on the communication cost, and model the problem of load balance as a
Mixed-Integer Linear Program. Afterwards, we present an extended solution
called ALBIC, which support general jobs. We implement the proposed techniques
on top of Apache Storm, an open-source Parallel Stream Processing Engine. The
extensive experimental results over both synthetic and real datasets show that
our techniques clearly outperform existing approaches
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
The ongoing hardware evolution exhibits an escalation in the number, as well
as in the heterogeneity, of computing resources. The pressure to maintain
reasonable levels of performance and portability forces application developers
to leave the traditional programming paradigms and explore alternative
solutions. PaStiX is a parallel sparse direct solver, based on a dynamic
scheduler for modern hierarchical manycore architectures. In this paper, we
study the benefits and limits of replacing the highly specialized internal
scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and
StarPU. The tasks graph of the factorization step is made available to the two
runtimes, providing them the opportunity to process and optimize its traversal
in order to maximize the algorithm efficiency for the targeted hardware
platform. A comparative study of the performance of the PaStiX solver on top of
its native internal scheduler, PaRSEC, and StarPU frameworks, on different
execution environments, is performed. The analysis highlights that these
generic task-based runtimes achieve comparable results to the
application-optimized embedded scheduler on homogeneous platforms. Furthermore,
they are able to significantly speed up the solver on heterogeneous
environments by taking advantage of the accelerators while hiding the
complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014
AstroGrid-D: Grid Technology for Astronomical Science
We present status and results of AstroGrid-D, a joint effort of
astrophysicists and computer scientists to employ grid technology for
scientific applications. AstroGrid-D provides access to a network of
distributed machines with a set of commands as well as software interfaces. It
allows simple use of computer and storage facilities and to schedule or monitor
compute tasks and data management. It is based on the Globus Toolkit middleware
(GT4). Chapter 1 describes the context which led to the demand for advanced
software solutions in Astrophysics, and we state the goals of the project. We
then present characteristic astrophysical applications that have been
implemented on AstroGrid-D in chapter 2. We describe simulations of different
complexity, compute-intensive calculations running on multiple sites, and
advanced applications for specific scientific purposes, such as a connection to
robotic telescopes. We can show from these examples how grid execution improves
e.g. the scientific workflow. Chapter 3 explains the software tools and
services that we adapted or newly developed. Section 3.1 is focused on the
administrative aspects of the infrastructure, to manage users and monitor
activity. Section 3.2 characterises the central components of our architecture:
The AstroGrid-D information service to collect and store metadata, a file
management system, the data management system, and a job manager for automatic
submission of compute tasks. We summarise the successfully established
infrastructure in chapter 4, concluding with our future plans to establish
AstroGrid-D as a platform of modern e-Astronomy.Comment: 14 pages, 12 figures Subjects: data analysis, image processing,
robotic telescopes, simulations, grid. Accepted for publication in New
Astronom
- …