85 research outputs found

    Hardware/software partitioning of streaming applications for multi-processor system-on-chip

    Get PDF
    Hardware/software (HW/SW) co-design has emerged as a crucial and integral part in the development of various embedded applications. Moreover, the increases in the number of embedded multimedia and medical applications make streaming throughput an important attribute of Multi-Processor System-on-Chip (MPSoC). As an important development step, HW/SW partitioning affects the system performance. This paper formulates the optimization of HW/SW partitioning aiming at maximizing streaming throughput with predefined area constraint, targeted for multi-processor system with hardware accelerator sharing capability. Software-oriented and hardware-oriented greedy heuristics for HW/SW partitioning are proposed, as well as a branch-and-bound algorithm with best-first search that utilizes greedy results as initial best solution. Several random graphs and two multimedia applications (JPEG encoder and MP3 decoder) are used for performance benchmarking against brute force ground truth. Results show that the proposed greedy algorithms produce fast solutions which achieve 87.7% and 84.2% near-optimal solution respectively compared to ground truth result. With the aid of greedy result as initial solution, the proposed branch-and-bound algorithm is able to produce ground truth solution up to 2.4741e+8 times faster in HW/SW partitioning time compared to exhaustive brute force method

    Knapsack Model and Algorithm for Hardware/Software Partitioning Problem

    Get PDF
    Efficient hardware/software partitioning is crucial towards realizing optimal solutions for constraint driven embedded systems. The size of the total solution space is typically quite large for this problem. In this paper, we show that the knapsack model could be employed for the rapid identification of hardware components that provide for time efficient implementations. In particular, we propose a method to split the problem into standard 0-1 knapsack problems in order to leverage on the classical approaches. The proposed method relies on the tight lower and upper bounds for each of these knapsack problems for the rapid elimination of the sub-problems, which are guaranteed not to give optimal results. Experimental results show that, for problem sizes ranging from 30 to 3000, the optimal solution of the whole problem can be obtained by solving only 1 sub-problem except for one case where it required the solution of 3 sub-problems

    Compilation de systèmes temps réel

    Get PDF
    I introduce and advocate for the concept of Real-Time Systems Compilation. By analogy with classical compilation, real-time systems compilation consists in the fully automatic construction of running, correct-by-construction implementations from functional and non-functional specifications of embedded control systems. Like in a classical compiler, the whole process must be fast (thus enabling a trial-and-error design style) and produce reasonably efficient code. This requires the use of fast heuristics, and the use of fine-grain platform and application models. Unlike a classical compiler, a real-time systems compiler must take into account non-functional properties of a system and ensure the respect of non-functional requirements (in addition to functional correctness). I also present Lopht, a real-time systems compiler for statically-scheduled real-time systems we built by combining techniques and concepts from real-time scheduling, compilation, and synchronous languages

    Uncertainty Theory Based Reliability-Centric Cyber-Physical System Design

    Get PDF
    Cyber-physical systems (CPSs) are built from, and depend upon, the seamless integration of software and hardware components. The most important challenge in CPS design and verification is to design CPS to be reliable in a variety of uncertainties, i.e., unanticipated and rapidly evolving environments and disturbances. The costs, delays and reliability of the designed CPS are highly dependent on software-hardware partitioning in the design. The key challenges in partitioning CPSs is that it is difficult to formalize reliability characterization in the same way as the uncertain cost and time delay. In this paper, we propose a new CPS design paradigm for reliability assurance while coping with uncertainty. To be specific, we develop an uncertain programming model for partitioning based on the uncertainty theory, to support the assured reliability. The uncertainty effect of the cost and delay time of components to be implemented can be modeled by the uncertainty variables with uncertainty distributions, and the reliability characterization is recursively derived. We convert the uncertain programming model and customize an improved heuristic to solve the converted model. Experiment results on some benchmarks and random graphs show that the uncertain method produces the design with higher reliability. Besides, in order to demonstrate the effectiveness of our model for in coping with uncertainty in design stage, we apply this uncertain framework and existing deterministic models in the design process of a sub-system that is used in real world subway control. The system implemented based on the uncertain model works better than the result of deterministic models. The proposed design paradigm has the potential to be generalized to the design of CPSs for greater assurances of safety and security under a variety of uncertainties

    Compiler techniques for scalable performance of stream programs on multicore architectures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 211-222).Given the ubiquity of multicore processors, there is an acute need to enable the development of scalable parallel applications without unduly burdening programmers. Currently, programmers are asked not only to explicitly expose parallelism but also concern themselves with issues of granularity, load-balancing, synchronization, and communication. This thesis demonstrates that when algorithmic parallelism is expressed in the form of a stream program, a compiler can effectively and automatically manage the parallelism. Our compiler assumes responsibility for low-level architectural details, transforming implicit algorithmic parallelism into a mapping that achieves scalable parallel performance for a given multicore target. Stream programming is characterized by regular processing of sequences of data, and it is a natural expression of algorithms in the areas of audio, video, digital signal processing, networking, and encryption. Streaming computation is represented as a graph of independent computation nodes that communicate explicitly over data channels. Our techniques operate on contiguous regions of the stream graph where the input and output rates of the nodes are statically determinable. Within a static region, the compiler first automatically adjusts the granularity and then exploits data, task, and pipeline parallelism in a holistic fashion. We introduce techniques that data-parallelize nodes that operate on overlapping sliding windows of their input, translating serializing state into minimal and parametrized inter-core communication. Finally, for nodes that cannot be data-parallelized due to state, we are the first to automatically apply software-pipelining techniques at a coarse granularity to exploit pipeline parallelism between stateful nodes. Our framework is evaluated in the context of the StreamIt programming language. StreamIt is a high-level stream programming language that has been shown to improve programmer productivity in implementing streaming algorithms. We employ the StreamIt Core benchmark suite of 12 real-world applications to demonstrate the effectiveness of our techniques for varying multicore architectures. For a 16-core distributed memory multicore, we achieve a 14.9x mean speedup. For benchmarks that include sliding-window computation, our sliding-window data-parallelization techniques are required to enable scalable performance for a 16-core SMP multicore (14x mean speedup) and a 64-core distributed shared memory multicore (52x mean speedup).by Michael I. Gordon.Ph.D

    Compilation and Scheduling Techniques for Embedded Systems

    Get PDF
    Embedded applications are constantly increasing in size, which has resulted in increasing demand on designers of digital signal processors (DSPs) to meet the tight memory, size and cost constraints. With this trend, memory requirement reduction through code compaction and variable coalescing techniques are gaining more ground. Also, as the current trend in complex embedded systems of using multiprocessor system-on-chip (MPSoC) grows, problems like mapping, memory management and scheduling are gaining more attention. The first part of the dissertation deals with problems related to digital signal processors. Most modern DSPs provide multiple address registers and a dedicated address generation unit (AGU) which performs address generation in parallel to instruction execution. A careful placement of variables in memory is important in decreasing the number of address arithmetic instructions leading to compact and efficient code. Chapters 2 and 3 present effective heuristics for the simple and the general offset assignment problems with variable coalescing. A solution based on simulated annealing is also presented. Chapter 4 presents an optimal integer linear programming (ILP) solution to the offset assignment problem with variable coalescing and operand permutation. A new approach to the general offset assignment problem is introduced. Chapter 5 presents an optimal ILP formulation and a genetic algorithm solution to the address register allocation problem (ARA) with code transformation techniques. The ARA problem is used to generate compact codes for array-intensive embedded applications. In the second part of the dissertation, we study problems related to MPSoCs. MPSoCs provide the flexibility to meet the performance requirements of multimedia applications while respecting the tight embedded system constraints. MPSoC-based embedded systems often employ software-managed memories called scratch-pad memories (SPM). Scheduling the tasks of an application on the processors and partitioning the available SPM budget among those processors are two critical issues in reducing the overall computation time. Traditionally, the step of task scheduling is applied separately from the memory partitioning step. Such a decoupled approach may miss better quality schedules. Chapters 6 and 7 present effective heuristics that integrate task allocation and SPM partitioning to further reduce the execution time of embedded applications for single and multi-application scenarios

    A Tabu Search-Based Memetic Algorithm for Hardware/Software Partitioning

    Get PDF
    Hardware/software (HW/SW) partitioning is to determine which components of a system are implemented on hardware and which ones on software. It is one of the most important steps in the design of embedded systems. The HW/SW partitioning problem is an NP-hard constrained binary optimization problem. In this paper, we propose a tabu search-based memetic algorithm to solve the HW/SW partitioning problem. First, we convert the constrained binary HW/SW problem into an unconstrained binary problem using an adaptive penalty function that has no parameters in it. A memetic algorithm is then suggested for solving this unconstrained problem. The algorithm uses a tabu search as its local search procedure. This tabu search has a special feature with respect to solution generation, and it uses a feedback mechanism for updating the tabu tenure. In addition, the algorithm integrates a path relinking procedure for exploitation of newly found solutions. Computational results are presented using a number of test instances from the literature. The algorithm proves its robustness when its results are compared with those of two other algorithms. The effectiveness of the proposed parameter-free adaptive penalty function is also shown
    • …
    corecore