240 research outputs found
An Order-Aware Dataflow Model for Parallel Unix Pipelines
We present a dataflow model for modelling parallel Unix shell pipelines. To
accurately capture the semantics of complex Unix pipelines, the dataflow model
is order-aware, i.e., the order in which a node in the dataflow graph consumes
inputs from different edges plays a central role in the semantics of the
computation and therefore in the resulting parallelization. We use this model
to capture the semantics of transformations that exploit data parallelism
available in Unix shell computations and prove their correctness. We
additionally formalize the translations from the Unix shell to the dataflow
model and from the dataflow model back to a parallel shell script. We implement
our model and transformations as the compiler and optimization passes of a
system parallelizing shell pipelines, and use it to evaluate the speedup
achieved on 47 pipelines
Faster Algorithms for Weighted Recursive State Machines
Pushdown systems (PDSs) and recursive state machines (RSMs), which are
linearly equivalent, are standard models for interprocedural analysis. Yet RSMs
are more convenient as they (a) explicitly model function calls and returns,
and (b) specify many natural parameters for algorithmic analysis, e.g., the
number of entries and exits. We consider a general framework where RSM
transitions are labeled from a semiring and path properties are algebraic with
semiring operations, which can model, e.g., interprocedural reachability and
dataflow analysis problems.
Our main contributions are new algorithms for several fundamental problems.
As compared to a direct translation of RSMs to PDSs and the best-known existing
bounds of PDSs, our analysis algorithm improves the complexity for
finite-height semirings (that subsumes reachability and standard dataflow
properties). We further consider the problem of extracting distance values from
the representation structures computed by our algorithm, and give efficient
algorithms that distinguish the complexity of a one-time preprocessing from the
complexity of each individual query. Another advantage of our algorithm is that
our improvements carry over to the concurrent setting, where we improve the
best-known complexity for the context-bounded analysis of concurrent RSMs.
Finally, we provide a prototype implementation that gives a significant
speed-up on several benchmarks from the SLAM/SDV project
Exploiting Multi-Level Parallelism in Streaming Applications for Heterogeneous Platforms with GPUs
Heterogeneous computing platforms support the traditional types of
parallelism, such as e.g., instruction-level, data, task, and pipeline
parallelism, and provide the opportunity to exploit a combination of
different types of parallelism at different platform levels. The
architectural diversity of platform components makes tapping into the
platform potential a challenging programming task. This thesis makes an
important step in this direction by introducing a novel methodology for
automatic generation of structured, multi-level parallel programs from
sequential applications. We introduce a novel hierarchical intermediate
program representation (HiPRDG) that captures the notions of structure
and hierarchy in the polyhedral model used for compile-time program
transformation and code generation. Using the HiPRDG as the starting
point, we present a novel method for generation of multi-level programs
(MLPs) featuring different types of parallelism, such as task, data, and
pipeline parallelism. Moreover, we introduce concepts and techniques for
data parallelism identification, GPU code generation, and asynchronous
data-driven execution on heterogeneous platforms with efficient
overlapping of host-accelerator communication and computation. By
enabling the modular, hybrid parallelization of program model components
via HiPRDG, this thesis opens the door for highly efficient tailor-made
parallel program generation and auto-tuning for next generations of
multi-level heterogeneous platforms with diverse accelerators.Computer Systems, Imagery and Medi
Data stream processing meets the Advanced Metering Infrastructure: possibilities, challenges and applications
Distribution of electricity is changing.Energy production is increasingly distributed, weather dependent and located in the distribution network, close to consumers.Energy consumption is increasing throughout society and the electrification of transportation is driving distribution networks closer to the limits.Operating the networks closer to their limits also increases the risk for faults.Continuous monitoring of the distribution network closest to the customers is needed in order to mitigate this risk.The Advanced Metering Infrastructure introduced smart meters throughout the distribution network.Data stream processing is a computing paradigm that offers low latency results from analysis on large volumes of the data.This thesis investigates the possibilities and challenges for continuous monitoring that are created when the Advanced Metering Infrastructure and data stream processing meet.The challenges that are addressed in the thesis are efficient processing of unordered (also called out-of-order) data and efficient usage of the computational resources present in the Advanced Metering Infrastructure.Contributions towards more efficient processing of out-of-order data are made with eChIDNA and TinTiN. Both are systems that utilize knowledge about smart meter data to directly produce results where possible and storing only data that is relevant for late data in order to produce updated results when such late data arrives. eChIDNA is integrated in the streaming query itself, while TinTiN is a streaming middleware that can be applied to streaming queries in order to make them resilient against out-of-order data.Eventual determinism is defined in order to formally investigate the deterministic properties of output produced by such systems.Contributions towards efficient usage of the computational resources of the Advanced Metering Infrastructure are made with the application LoCoVolt.LoCoVolt implements a monitoring algorithm that can run on equipment that is localized in the communication infrastructure of the Advanced Metering Infrastructure and can take advantage of the overlap between the communication and distribution networks.All contributions are evaluated on hardware that is available in current AMI systems, using large scale data obtained from a real production AMI
Predictable embedded multiprocessor architecture for streaming applications
The focus of this thesis is on embedded media systems that execute applications from the application domain car infotainment. These applications, which we refer to as jobs, typically fall in the class of streaming, i.e. they process on a stream of data. The jobs are executed on heterogeneous multiprocessor platforms, for performance and power efficiency reasons. Most of these jobs have firm real-time requirements, like throughput and end-to-end latency. Car-infotainment systems become increasingly more complex, due to an increase in the supported number of jobs and an increase of resource sharing. Therefore, it is hard to verify, for each job, that the realtime requirements are satisfied. To reduce the verification effort, we elaborate on an architecture for a predictable system from which we can verify, at design time, that the job’s throughput and end-to-end latency requirements are satisfied. This thesis introduces a network-based multiprocessor system that is predictable. This is achieved by starting with an architecture where processors have private local memories and execute tasks in a static order, so that the uncertainty in the temporal behaviour is minimised. As an interconnect, we use a network that supports guaranteed communication services so that it is guaranteed that data is delivered in time. The architecture is extended with shared local memories, run-time scheduling of tasks, and a memory hierarchy. Dataflow modelling and analysis techniques are used for verification, because they allow cyclic data dependencies that influence the job’s performance. Shown is how to construct a dataflow model from a job that is mapped onto our predictable multiprocessor platforms. This dataflow model takes into account: computation of tasks, communication between tasks, buffer capacities, and scheduling of shared resources. The job’s throughput and end-to-end latency bounds are derived from a self-timed execution of the dataflow graph, by making use of existing dataflow-analysis techniques. It is shown that the derived bounds are tight, e.g. for our channel equaliser job, the accuracy of the derived throughput bound is within 10.1%. Furthermore, it is shown that the dataflow modelling and analysis techniques can be used despite the use of shared memories, run-time scheduling of tasks, and caches
Studying and Analysing Transactional Memory Using Interval Temporal Logic and AnaTempura
Transactional memory (TM) is a promising lock-free synchronisation technique which offers a high-level abstract parallel programming model for future chip multiprocessor (CMP) systems.
Moreover, it adapts the well-established popular paradigm of transactions and thus provides a general and flexible way to allow programs to read and modify disparate memory locations atomically as a single operation. In this thesis, we propose a general framework for validating a TM design, starting from a formal specification into a hardware implementation, with its underpinning theory and refinement. A methodology in this work starts with a high-level and executable specification model for an abstract TM with verification for various correctness conditions of concurrent transactions. This model is constructed within a flexible transition framework that allows verifying correctness of a TM system with animation. Then, we present a formal executable specification for a chip-dual single-cycle MIPS processor with a cache coherence protocol and integrate the provable TM system. Finally, we transform the dual processors with the TM from a high-level description into a Hardware Description Language (VHDL), using some proposed refinement and restriction rules. Interval Temporal Logic (ITL) and its programming language subset AnaTempura are used to build, execute and test the model, since they together provide a powerful framework supporting logical reasoning about time intervals as well as programming and simulation
- …