322 research outputs found
Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources
The growing deployment of sensors as part of Internet of Things (IoT) is
generating thousands of event streams. Complex Event Processing (CEP) queries
offer a useful paradigm for rapid decision-making over such data sources. While
often centralized in the Cloud, the deployment of capable edge devices on the
field motivates the need for cooperative event analytics that span Edge and
Cloud computing. Here, we identify a novel problem of query placement on edge
and Cloud resources for dynamically arriving and departing analytic dataflows.
We define this as an optimization problem to minimize the total makespan for
all event analytics, while meeting energy and compute constraints of the
resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for
such dynamic dataflows, and validate them using detailed simulations for 100 -
1000 edge devices and VMs. The results show that our heuristics offer
O(seconds) planning time, give a valid and high quality solution in all cases,
and reduce the number of query migrations. Furthermore, rebalance strategies
when applied in these heuristics have significantly reduced the makespan by
around 20 - 25%.Comment: 11 pages, 7 figure
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs
Sparse convolution plays a pivotal role in emerging workloads, including
point cloud processing in AR/VR, autonomous driving, and graph understanding in
recommendation systems. Since the computation pattern is sparse and irregular,
specialized high-performance kernels are required. Existing GPU libraries offer
two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is
easy to implement but not optimal in performance, while the dataflows with
overlapped computation and memory access (e.g.implicit GEMM) are highly
performant but have very high engineering costs. In this paper, we introduce
TorchSparse++, a new GPU library that achieves the best of both worlds. We
create a highly efficient Sparse Kernel Generator that generates performant
sparse convolution kernels at less than one-tenth of the engineering cost of
the current state-of-the-art system. On top of this, we design the Sparse
Autotuner, which extends the design space of existing sparse convolution
libraries and searches for the best dataflow configurations for training and
inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and
1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art
MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is
1.2-1.3x faster than SpConv v2 in mixed precision training across seven
representative autonomous driving benchmarks. It also seamlessly supports graph
convolutions, achieving 2.6-7.6x faster inference speed compared with
state-of-the-art graph deep learning libraries.Comment: MICRO 2023; Haotian Tang and Shang Yang contributed equally to this
projec
Understanding the Design Space of Dataflows for Graph Neural Network Accelerators
Deep Neural Networks (DNNs) have enabled numerous applications like Image Classification, Speech Recognition, Natural Language Processing, Robotics, Recommendation Systems etc. However, DNN algorithms like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are not capable of learning arbitrary data representations. Graph Neural Networks (GNNs) are becoming popular due to their success at learning irregular data which can be represented by graphs. GNNs consist of 2 phases - (1) an irregular (memory intensive) phase known as Aggregation where the information is aggregated from the neighbours, and (2) a regular (compute intensive) phase known as Combination for reduction of feature vector size. These operations cannot be handled efficiently by the conventional CPUs and GPUs. Consequently, dedicated GNN accelerators have been proposed which use particular dataflows for each phase(intra-phase) and different communication strategies between the two phases(inter-phase). Prior works on GNN Accelerators propose different optimizations in hardware and software in order to efficiently run Graph Neural Networks. These works propose specific intra-phase and inter-phase dataflows, microarchitecture, graph partitioning technique and different ways to handle sparsity. However, since the number of applications and their underlying GNN algorithms are increasing, it is important to design future proof GNN accelerators. It is necessary to understand the impact of different design choices on the desired metrics like performance and energy and to understand the design-space of the GNN accelerators before making design decisions. This work aims at understanding the design space of GNN dataflows and proposes a systematic approach to classify the dataflows and quantitatively model the trade-offs. There are various kinds of design choices like hardware parameters, graph partitioning techniques, dataflows, handling sparsity etc. In this work we specifically focus on the design space of GNN dataflows. We model our cycle accurate simulation infrastructure OMEGA for modelling GNN accelerators by making modifications to the simulator STONNE which models DNN accelerators. We also build an analytical model around STONNE to obtain relevant statistics related to different inter-phase dataflows. We propose a taxonomy to describe and classify different GNN dataflows and characterize the performance and energy of different GNN dataflows on different GNN workloads. We also describe the hardware capabilities that are required to support different GNN dataflows. We choose representative mappings from the search space and evaluate the design parameters affecting the performance and energy of dataflows using our simulation infrastructure and we report our insights and takeaways from the evaluations. We analyze the impact of different inter-phase and intra-phase dataflow design parameters on different workloads. A structured approach to understand the impact of GNN dataflows will enable systematic design of future GNN accelerators. A systematic insight into the design space of GNN dataflows will also lead to design of mapping optimizers for GNNs.M.S
Veni Vidi Vici, A Three-Phase Scenario For Parameter Space Analysis in Image Analysis and Visualization
Automatic analysis of the enormous sets of images is a critical task in life
sciences. This faces many challenges such as: algorithms are highly
parameterized, significant human input is intertwined, and lacking a standard
meta-visualization approach. This paper proposes an alternative iterative
approach for optimizing input parameters, saving time by minimizing the user
involvement, and allowing for understanding the workflow of algorithms and
discovering new ones. The main focus is on developing an interactive
visualization technique that enables users to analyze the relationships between
sampled input parameters and corresponding output. This technique is
implemented as a prototype called Veni Vidi Vici, or "I came, I saw, I
conquered." This strategy is inspired by the mathematical formulas of numbering
computable functions and is developed atop ImageJ, a scientific image
processing program. A case study is presented to investigate the proposed
framework. Finally, the paper explores some potential future issues in the
application of the proposed approach in parameter space analysis in
visualization
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
The DS-Pnet modeling formalism for cyber-physical system development
This work presents the DS-Pnet modeling formalism (Dataflow, Signals and Petri nets), designed for the development of cyber-physical systems, combining the characteristics of Petri nets and dataflows to support the modeling of mixed systems containing both reactive parts and data processing operations. Inheriting the features of the parent IOPT Petri net class, including an external interface composed of input and output signals and events, the addition of dataflow operations brings enhanced modeling capabilities to specify mathematical data transformations and graphically express the dependencies between signals. Data-centric systems, that do not require reactive controllers, are designed using pure dataflow models.
Component based model composition enables reusing existing components, create libraries of previously tested components and hierarchically decompose complex systems into smaller sub-systems.
A precise execution semantics was defined, considering the relationship between dataflow and Petri net nodes, providing an abstraction to define the interface between reactive controllers and input and output signals, including analog sensors and actuators.
The new formalism is supported by the IOPT-Flow Web based tool framework, offering tools to design and edit models, simulate model execution on the Web browser, plus model-checking and software/hardware automatic code generation tools to implement controllers running on embedded devices (C,VHDL and JavaScript).
A new communication protocol was created to permit the automatic implementation of distributed cyber-physical systems composed of networks of remote components communicating over the Internet. The editor tool connects directly to remote embedded devices running DS-Pnet models and may import remote components into new models, contributing to simplify the creation of distributed cyber-physical applications, where the communication between distributed components is specified just by drawing arcs.
Several application examples were designed to validate the proposed formalism and the associated framework, ranging from hardware solutions, industrial applications to distributed software applications
Elastic Dataflow Processing on the Cloud
Τα νεφη εχουν μετατραπει σε μια ελκυστικη πλατφορμα για την πολυπλοκη
επεξεργασια δεδομενων μεγαλης κλιμακας, ειδικα εξαιτιας της εννοιας της
ελαστικοτητας, η οποια και τα χαρακτηριζει: οι υπολογιστικοι ποροι
μπορουν να εκμισθωθουν δυναμικα και να χρησιμοποιουνται για οσο χρονο
ειναι απαραιτητο. Αυτο δινει την δυνατοτητα να δημιουργηθει μια εικονικη
υποδομη η οποια μπορει να αλλαζει δυναμικα στο χρονο. Οι συγχρονες
εφαρμογες απαιτουν την εκτελεση πολυπλοκων ερωτηματων σε Μεγαλα Δεδομενα
για την εξορυξη γνωσης και την υποστηριξη επιχειρησιακων αποφασεων. Τα
πολυπλοκα αυτα ερωτηματα, εκφραζονται σε γλωσσες υψηλου επιπεδου και
τυπικα μεταφραζονται σε ροες επεξεργασιας δεδομενων, η απλα ροες
δεδομενων. Ενα λογικο ερωτημα που τιθεται ειναι κατα ποσον η
ελαστικοτητα επηρεαζει την εκτελεση των ροων δεδομενων και με πιο τροπο.
Ειναι λογικο οτι η εκτελεση να ειναι πιθανον γρηγοροτερη αν
χρησιμοποιηθουν περισ- σοτεροι υπολογιστικοι ποροι, αλλα το κοστος θα
ειναι υψηλοτερο. Αυτο δημιουργει την εννοια της οικο-ελαστικοτητας, ενος
επιπλεον τυπου ελαστικοτητας ο οποιος προερχεται απο την οικονο- μικη
θεωρια, και συλλαμβανει τις εναλλακτικες μεταξυ του χρονου εκτελεσης και
του χρηματικου κοστους οπως προκυπτει απο την χρηση των πορων.
Στα πλαισια αυτης της διδακτορικης διατριβης, προσεγγιζουμε την
ελαστικοτητα με ενα ενοποιημενο μοντελο που περιλαμβανει και τις δυο
ειδων ελαστικοτητες που υπαρχουν στα υπολογιστικα νεφη. Αυτη η
ενοποιημενη προσεγγιση της ελαστικοτητας ειναι πολυ σημαντικη στην
σχεδιαση συστηματων που ρυθμιζονται αυτοματα (auto-tuned) σε περιβαλλοντα
νεφους. Αρχικα δειχνουμε οτι η οικο-ελαστικοτητα υπαρχει σε αρκετους
τυπους υπολογισμου που εμφανιζονται συχνα στην πραξη και οτι μπορει να
βρεθει χρησιμοποιωντας εναν απλο, αλλα ταυτοχρονα αποδοτικο και ε-
πεκτασιμο αλγοριθμο. Επειτα, παρουσιαζουμε δυο εφαρμογες που
χρησιμοποιουν αλγοριθμους οι οποιοι χρησιμοποιουν το ενοποιημενο μοντελο
ελαστικοτητας που προτεινουμε για να μπορουν να προσαρμοζουν δυναμικα το
συστημα στα ερωτηματα της εισοδου: 1) την ελαστικη επεξεργασια αναλυτικων
ερωτηματων τα οποια εχουν πλανα εκτελεσης με μορφη δεντρων με σκοπο την
μεγι- στοποιηση του κερδους και 2) την αυτοματη διαχειριση χρησιμων
ευρετηριων λαμβανοντας υποψη το χρηματικο κοστος των υπολογιστικων και
των αποθηκευτικων πορων. Τελος, παρουσιαζουμε το EXAREME, ενα συστημα για
την ελαστικη επεξεργασια μεγαλου ογκου δεδομενων στο νεφος το οποιο
εχει χρησιμοποιηθει και επεκταθει σε αυτην την δουλεια. Το συστημα
προσφερει δηλωτικες γλωσσες που βασιζονται στην SQL επεκταμενη με
συναρτησεις οι οποιες μπορει να οριστουν απο χρηστες (User-Defined
Functions, UDFs). Επιπλεον, το συντακτικο της γλωσσας εχει επεκταθει με
στοιχεια παραλληλισμου. Το EXAREME εχει σχεδιαστει για να εκμεταλλευεται
τις ελαστικοτη- τες που προσφερουν τα νεφη, δεσμευοντας και αποδεσμευοντας
υπολογιστικους πορους δυναμικα με σκοπο την προσαρμογη στα ερωτηματα.Clouds have become an attractive platform for the large-scale processing of
modern applications on Big Data, especially due to the concept of elasticity,
which characterizes them: resources can be leased on demand and used for as
much time as needed, offering the ability to create virtual infrastructures
that change dynamically over time. Such applications often require processing
of complex queries that are expressed in a high-level language and are
typically transformed into data processing flows (dataflows). A logical
question that arises is whether elasticity affects dataflow execution and in
which way. It seems reasonable that the execution is faster when more resources
are used, however the monetary cost is higher. This gives rise to the concept
eco-elasticity, an additional kind of elasticity that comes from economics, and
captures the trade-offs between the response time of the system and the amount
of money we pay for it as influenced by the use of different amounts of
resources.
In this thesis, we approach the elasticity of clouds in a unified way that
combines both the traditional notion and eco-elasticity. This unified
elasticity concept is essential for the development of auto-tuned systems in
cloud environments. First, we demonstrate that eco-elasticity exists in several
common tasks that appear in practice and that can be discovered using a simple,
yet highly scalable and efficient algorithm. Next, we present two cases of
auto-tuned algorithms that use the unified model of elasticity in order to
adapt to the query workload: 1) processing analytical queries in the form of
tree execution plans in order to maximize profit and 2) automated index
management taking into account compute and storage re- sources. Finally, we
describe EXAREME, a system for elastic data processing on the cloud that has
been used and extended in this work. The system offers declarative languages
that are based on SQL with user-defined functions (UDFs) extended with
parallelism primi- tives. EXAREME exploits both elasticities of clouds by
dynamically allocating and deallocating compute resources in order to adapt to
the query workload
- …