10 research outputs found
STAPL-RTS: A Runtime System for Massive Parallelism
Modern High Performance Computing (HPC) systems are complex, with deep memory hierarchies and increasing use of computational heterogeneity via accelerators. When developing applications for these platforms, programmers are faced with two bad choices. On one hand, they can explicitly manage machine resources, writing programs using low level primitives from multiple APIs (e.g., MPI+OpenMP), creating efficient but rigid, difficult to extend, and non-portable implementations. Alternatively, users can adopt higher level programming environments, often at the cost of lost performance. Our approach is to maintain the high level nature of the application without sacrificing performance by relying on the transfer of high level, application semantic knowledge between layers of the software stack at an appropriate level of abstraction and performing optimizations on a per-layer basis. In this dissertation, we present the STAPL Runtime System (STAPL-RTS), a runtime system built for portable performance, suitable for massively parallel machines. While the STAPL-RTS abstracts and virtualizes the underlying platform for portability, it uses information from the upper layers to perform the appropriate low level optimizations that restore the performance characteristics.
We outline the fundamental ideas behind the design of the STAPL-RTS, such as the always distributed communication model and its asynchronous operations. Through appropriate code examples and benchmarks, we prove that high level information allows applications written on top of the STAPL-RTS to attain the performance of optimized, but ad hoc solutions. Using the STAPL library, we demonstrate how this information guides important decisions in the STAPL-RTS, such as multi-protocol communication coordination and request aggregation using established C++ programming idioms.
Recognizing that nested parallelism is of increasing interest for both expressivity and performance, we present a parallel model that combines asynchronous, one-sided operations with isolated nested parallel sections. Previous approaches to nested parallelism targeted either static applications through the use of blocking, isolated sections, or dynamic applications by using asynchronous mechanisms (i.e., recursive task spawning) which come at the expense of isolation. We combine the flexibility of dynamic task creation with the isolation guarantees of the static models by allowing the creation of asynchronous, one-sided nested parallel sections that work in tandem with the more traditional, synchronous, collective nested parallelism. This allows selective, run-time customizable use of parallelism in an application, based on the input and the algorithm
Scalable and responsive real time event processing using cloud computing
PhD ThesisCloud computing provides the potential for scalability and adaptability in a cost e ective
manner. However, when it comes to achieving scalability for real time applications
response time cannot be high. Many applications require good performance and low
response time, which need to be matched with the dynamic resource allocation. The
real time processing requirements can also be characterized by unpredictable rates
of incoming data streams and dynamic outbursts of data. This raises the issue of
processing the data streams across multiple cloud computing nodes. This research
analyzes possible methodologies to process the real time data in which applications
can be structured as multiple event processing networks and be partitioned over the
set of available cloud nodes. The approach is based on queuing theory principles
to encompass the cloud computing. The transformation of the raw data into useful
outputs occurs in various stages of processing networks which are distributed across
the multiple computing nodes in a cloud. A set of valid options is created to understand
the response time requirements for each application. Under a given valid set of
conditions to meet the response time criteria, multiple instances of event processing
networks are distributed in the cloud nodes. A generic methodology to scale-up and
scale-down the event processing networks in accordance to the response time criteria
is de ned. The real time applications that support sophisticated decision support
mechanisms need to comply with response time criteria consisting of interdependent
data
ow paradigms making it harder to improve the performance. Consideration is
given for ways to reduce the latency,improve response time and throughput of the real
time applications by distributing the event processing networks in multiple computing
nodes
XXV Congreso Argentino de Ciencias de la Computación - CACIC 2019: libro de actas
Trabajos presentados en el XXV Congreso Argentino de Ciencias de la Computación (CACIC), celebrado en la ciudad de RÃo Cuarto los dÃas 14 al 18 de octubre de 2019 organizado por la Red de Universidades con Carreras en Informática (RedUNCI) y Facultad de Ciencias Exactas, FÃsico-QuÃmicas y Naturales - Universidad Nacional de RÃo CuartoRed de Universidades con Carreras en Informátic