424 research outputs found
A Timed-Automata Based Middleware for Time-Critical Multicore Applications
International audienceThe goal of our work is to contribute to unification of design methodologies for multi-core time-critical systems. Various models of computation have been proposed in literature for this kind of systems, but lack of coherency between them makes unified coherent design methodology challenging. In addition, there is a significant gap between the models of computation and the real-time scheduling and analysis techniques. To overcome this difficulty, we represent both the models of computation and the scheduling policies by timed automata. While, traditionally, they are only used for simulation and validation, we use the automata for programming. We believe that using the same formal language for different design styles and methods is an important step to close the gap between them. Our approach is demonstrated using a publicly available toolset, an industrial application use case and a multi-core platform
Dataplane Specialization for High-performance OpenFlow Software Switching
OpenFlow is an amazingly expressive dataplane program-
ming language, but this expressiveness comes at a severe
performance price as switches must do excessive packet clas-
sification in the fast path. The prevalent OpenFlow software
switch architecture is therefore built on flow caching, but
this imposes intricate limitations on the workloads that can
be supported efficiently and may even open the door to mali-
cious cache overflow attacks. In this paper we argue that in-
stead of enforcing the same universal flow cache semantics
to all OpenFlow applications and optimize for the common
case, a switch should rather automatically specialize its dat-
aplane piecemeal with respect to the configured workload.
We introduce ES WITCH , a novel switch architecture that
uses on-the-fly template-based code generation to compile
any OpenFlow pipeline into efficient machine code, which
can then be readily used as fast path. We present a proof-
of-concept prototype and we demonstrate on illustrative use
cases that ES WITCH yields a simpler architecture, superior
packet processing speed, improved latency and CPU scala-
bility, and predictable performance. Our prototype can eas-
ily scale beyond 100 Gbps on a single Intel blade even with
complex OpenFlow pipelines
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values.
We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.United States. Dept. of Energy (Award DE-SC0005288)National Science Foundation (U.S.) (Grant 0964004)Intel CorporationCognex CorporationAdobe System
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values.
We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.United States. Dept. of Energy (Award DE-SC0005288)National Science Foundation (U.S.) (Grant 0964004)Intel CorporationCognex CorporationAdobe System
Vesyla-II: An Algorithm Library Development Tool for Synchoros VLSI Design Style
High-level synthesis (HLS) has been researched for decades and is still
limited to fast FPGA prototyping and algorithmic RTL generation. A feasible
end-to-end system-level synthesis solution has never been rigorously proven.
Modularity and composability are the keys to enabling such a system-level
synthesis framework that bridges the huge gap between system-level
specification and physical level design. It implies that 1) modules in each
abstraction level should be physically composable without any irregular glue
logic involved and 2) the cost of each module in each abstraction level is
accurately predictable. The ultimate reasons that limit how far the
conventional HLS can go are precisely that it cannot generate modular designs
that are physically composable and cannot accurately predict the cost of its
design. In this paper, we propose Vesyla, not as yet another HLS tool, but as a
synthesis tool that positions itself in a promising end-to-end synthesis
framework and preserving its ability to generate physically composable modular
design and to accurately predict its cost metrics. We present in the paper how
Vesyla is constructed focusing on the novel platform it targets and the
internal data structures that highlights the uniqueness of Vesyla. We also show
how Vesyla will be positioned in the end-to-end synchoros synthesis framework
called SiLago
- …