12,435 research outputs found
A Novel SAT-Based Approach to the Task Graph Cost-Optimal Scheduling Problem
The Task Graph Cost-Optimal Scheduling Problem consists in scheduling a certain number of interdependent tasks onto a set of heterogeneous processors (characterized by idle and running rates per time unit), minimizing the cost of the entire process. This paper provides a novel formulation for this scheduling puzzle, in which an optimal solution is computed through a sequence of Binate Covering Problems, hinged within a Bounded Model Checking paradigm. In this approach, each covering instance, providing a min-cost trace for a given schedule depth, can be solved with several strategies, resorting to Minimum-Cost Satisfiability solvers or Pseudo-Boolean Optimization tools. Unfortunately, all direct resolution methods show very low efficiency and scalability. As a consequence, we introduce a specialized method to solve the same sequence of problems, based on a traditional all-solution SAT solver. This approach follows the "circuit cofactoring" strategy, as it exploits a powerful technique to capture a large set of solutions for any new SAT counter-example. The overall method is completed with a branch-and-bound heuristic which evaluates lower and upper bounds of the schedule length, to reduce the state space that has to be visited. Our results show that the proposed strategy significantly improves the blind binate covering schema, and it outperforms general purpose state-of-the-art tool
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
Coarse-grained reconfigurable array architectures
Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
CPU-FPGA heterogeneous architectures are attracting ever-increasing attention
in an attempt to advance computational capabilities and energy efficiency in
today's datacenters. These architectures provide programmers with the ability
to reprogram the FPGAs for flexible acceleration of many workloads.
Nonetheless, this advantage is often overshadowed by the poor programmability
of FPGAs whose programming is conventionally a RTL design practice. Although
recent advances in high-level synthesis (HLS) significantly improve the FPGA
programmability, it still leaves programmers facing the challenge of
identifying the optimal design configuration in a tremendous design space.
This paper aims to address this challenge and pave the path from software
programs towards high-quality FPGA accelerators. Specifically, we first propose
the composable, parallel and pipeline (CPP) microarchitecture as a template of
accelerator designs. Such a well-defined template is able to support efficient
accelerator designs for a broad class of computation kernels, and more
importantly, drastically reduce the design space. Also, we introduce an
analytical model to capture the performance and resource trade-offs among
different design configurations of the CPP microarchitecture, which lays the
foundation for fast design space exploration. On top of the CPP
microarchitecture and its analytical model, we develop the AutoAccel framework
to make the entire accelerator generation automated. AutoAccel accepts a
software program as an input and performs a series of code transformations
based on the result of the analytical-model-based design space exploration to
construct the desired CPP microarchitecture. Our experiments show that the
AutoAccel-generated accelerators outperform their corresponding software
implementations by an average of 72x for a broad class of computation kernels
Recommended from our members
CHASSIS : a combined hardware selection and scheduling technique for performance driven synthesis
This report describes a new technique that combines the Hardware Scheduling and Component Selection phases for High Level Synthesis. Our technique simultaneously selects components from a given library while it schedules the operations into different control steps. The algoríthm improves previous work in scheduling because component costs and performance are considered during the scheduling process, enlarging the design search space and resulting in better optimized desígns
Recommended from our members
A decision support environment for behavioral synthesis
We present a specification of a general environment for behavioral synthesis centered around the user/designer as the primary motivator for decisions in design development. At each stage of the design process, the user can perform transformations on the design description through graphical user interfaces. Quality measures, physical estimates, and design hints are given to the user at each stage
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
Formal and Informal Methods for Multi-Core Design Space Exploration
We propose a tool-supported methodology for design-space exploration for
embedded systems. It provides means to define high-level models of applications
and multi-processor architectures and evaluate the performance of different
deployment (mapping, scheduling) strategies while taking uncertainty into
account. We argue that this extension of the scope of formal verification is
important for the viability of the domain.Comment: In Proceedings QAPL 2014, arXiv:1406.156
- …