37 research outputs found

    Application-Specific Heterogeneous Network-on-Chip Design

    Get PDF
    Cataloged from PDF version of article.As a result of increasing communication demands, application-specific and scalable Network-on-Chips (NoCs) have emerged to connect processing cores and subsystems in Multiprocessor System-on-Chips. A challenge in application-specific NoC design is to find the right balance among different tradeoffs, such as communication latency, power consumption and chip area. We propose a novel approach that generates latency-aware heterogeneous NoC topology. Experimental results show that our approach improves the total communication latency up to 27% with modest power consumption. © 2013 The Author 2013. Published by Oxford University Press on behalf of The British Computer Society

    Application-specific heterogeneous network-on-chip design

    Get PDF
    As a result of increasing communication demands, application-specific and scalable Network-on-Chips (NoCs) have emerged to connect processing cores and subsystems in Multiprocessor System-on-Chips. A challenge in application-specific NoC design is to find the right balance among different tradeoffs, such as communication latency, power consumption and chip area. We propose a novel approach that generates latency-aware heterogeneous NoC topology. Experimental results show that our approach improves the total communication latency up to 27% with modest power consumption. © 2013 The Author 2013. Published by Oxford University Press on behalf of The British Computer Society

    Application-specific heterogeneous network-on-chip design

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2011.Thesis (Master's) -- Bilkent University, 2011.Includes bibliographical references leaves 68-74.With increasing communication demands of processors and memory cores in Systems-on-Chips (SoCs), application-specific and scalable Network-on-Chips (NoCs) are emerged to interconnect processing cores and subsystems in Multiprocessor System-on-Chips (MPSoCs). The challenge of application-specific NoC design is to find the right balance among different trade-offs such as communication latency, power consumption, and chip area. This thesis introduces a novel heterogeneous NoC design approach where biologically inspired evolutionary algorithm and 2-dimensional rectangle packing algorithm are used to place the processing elements with various properties into a constrained NoC area according to the tasks generated by Task Graph for Free (TGFF). TGFF is one of the pseudo-random task graph generators used for scheduling and allocation. Based on a given task graph, we minimize the maximum execution time in a Heterogeneous Chip-Multiprocessor. We specifi- cally emphasize on the communication cost as it is a big overhead in a multi-core architecture. Experimental results show that our approach improves total communication latency up to 27% with modest power consumption.Demirbaş, DilekM.S

    Parallel Evolutionary Algorithms for Energy Aware Scheduling

    Get PDF
    International audienceReducing energy consumption is an increasingly important issue in computing and embedded systems. In computing systems, minimizing energy consumption can significantly reduces the amount of energy bills. The demand for computing systems steadily increases and the cost of energy continues to rise. In embedded systems, reducing the use of energy allows to extend the autonomy of these systems. In addition, the reduction of energy decreases greenhouse gas emissions. Therefore, many researches are carried out to develop new methods in order to consume less energy. This chapter gives an overview of the main methods used to reduce the energy consumption in computing and embedded systems. As a use case and to give an example of a method, the chapter describes our new parallel bi-objective hybrid genetic algorithm that takes into account the completion time and the energy consumption. In terms of energy consumption, the obtained results show that our approach outperforms previous scheduling methods by a significant margin. In terms of completion time, the obtained schedules are also shorter than those of other algorithms

    Advances in parallel programming for electronic design automation

    Get PDF
    The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications. Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programming parallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription. We have evaluated the new scheduler on both micro-benchmarks and a very-large-scale integration (VLSI) application, and the results show that the new scheduler can achieve good performance and is very energy-efficient. Next we study the applicability of heterogeneous computing, specifically the graphics processing unit (GPU), to EDA. We demonstrate how to use GPU to accelerate VLSI placement, and we show that GPU can bring substantial performance gain to VLSI placement. Finally, as the design size keeps increasing, a more scalable solution will be distributed computing. We introduce a distributed power grid analysis framework built on top of DtCraft. This framework allows users to flexibly partition the design and automatically deploy the computations across several machines. In addition, we propose a job scheduler that can efficiently utilize cluster resource to improve the framework’s performance

    Energy- and quality-aware scheduling of periodic tasks in embedded real-time systems

    Get PDF
    Mobile Geräte dienen immer häufiger zur Ausführung von Echtzeitanwendungen, sie bieten immer mehr Rechenleistung und sie werden kleiner und leichter. Hohe Rechenleistung erfordert jedoch sehr viel Energie, was im Gegensatz zu den geringen Akkukapazitäten, die aus der Forderung nach kleinen und leichten Geräten resultieren, steht. Bei der Echtzeiteinplanung von Rechenprozessen gewinnt daher der Energieverbrauch der Geräte neben der rechtzeitigen Beendigung von Anwendungen zunehmend an Bedeutung, weil sie möglichst lange unabhängig vom Stromnetz betrieben werden sollen. Andererseits werden auf diesen Geräten rechenintensive Anwendungen ausgeführt, bei denen es wünschenswert ist, die maximale mit der verfügbaren Rechenleistung erzielbare Qualität zu erhalten. In dieser Arbeit wird ein Systemmodell vorgestellt, das den Design-to-time-Ansatz mit den Möglichkeiten der dynamischen Leistungsanpassung (Rechenleistung und verbrauchte elektrische Leistung) moderner Prozessoren vereinigt. Der Design-to-time-Ansatz ermöglicht Energieeinsparungen oder Qualitätssteigerungen durch die dynamische Auswahl alternativer Implementierungen, welche dieselbe Aufgabe mit unterschiedlicher Ausführungsdauer und Qualität bzw. Energieverbrauch erfüllen. Das Systemmodell umfaßt unter anderem periodische Prozesse mit harten Echtzeitbedingungen, Datenabhängigkeiten und alternativen Implementierungen, sowie Prozessoren mit diskreten Leistungsstufen. Die Einplanung der Prozesse erfolgt in zwei Phasen. In der Offline-Phase wird ein flexibler Schedule berechnet, der für die zur Laufzeit möglichen Kombinationen von verstrichener Zeit und noch einzuplanender Prozeßmenge den jeweils einzuplanenden Prozeß, sowie die zu verwendende Implementierung und gegebenenfalls die einzustellende Leistungsstufe beinhaltet. Dieser flexible Schedule wird während der Online-Phase mit vernachlässigbarem Zeit- und Energieaufwand von einem Scheduler interpretiert. Für die Berechnung der optimalen flexiblen Schedules wurde ein Optimierer entwickelt, der eine Folge von flexiblen Schedules mit monoton steigender Güte (niedriger Energieverbrauch bzw. hohe Qualität) generiert, und damit der Klasse der Anytime-Algorithmen zuzuordnen ist. Eine Variante der Dynamischen Programmierung dient zur Bestimmung global optimaler, flexibler Schedules, die beispielsweise als Basis für Benchmarks dienen. Eine auf Simulated Annealing basierende Variante des Optimierers ermöglicht ein schnelleres Auffinden guter, flexibler Schedules für umfangreichere Anwendungen.Mobile devices are excessively used for executing real-time applications, today. They provide increasing performance and they are getting more lightweight and smaller every day. Unfortunately, high processing performance demands much energy and thus anticipates smaller battery capacities resulting from the required size and weight of the devices. Therefore, energy consumption gains importance besides the timely completion of real-time tasks when a schedule has to be calculated, to provide a longer operating time independent of a power outlet. On the other hand, when calculation intensive tasks are being executed, high performance should be provided, to obtain maximum quality. This work presents a system model joining the design-to-time approach with modern processor's capabilities to run at different clock frequencies. Design-to-time scheduling allows for energy savings or quality enhancements by dynamically selecting alternative implementations, which fulfill a task's function with different time and with different energy consumption or quality. The system model comprises periodic tasks with hard real-time constraints, data-dependencies and alternative implementations, as well as processors with multiple clock modes. Scheduling is split in two phases. First, in an offline phase a flexible plan is calculated. It contains the task, implementation and clock frequency to be scheduled for every possible combination of elapsed time and unscheduled task set. Second, the flexible plan is interpreted by an online scheduler with a negligible amount of time and energy. A pair of optimization algorithms has been developed for calculating optimal flexible plans. They deliver a series of flexible plans with increasing quality or decreasing energy demand, and therefore they belong to the class of anytime algorithms. A variation of dynamic programming is used for finding globally optimal plans, e.g. aiming as reference values for benchmarks, whereas for complex system models an optimizer based on simulated annealing is provided, that finds good flexible plans fast

    Energy-Aware Scheduling of Conditional Task Graphs on NoC-Based MPSoCs

    Get PDF
    We investigate the problem of scheduling a set of tasks with individual deadlines and conditional precedence constraints on a heterogeneous Network on Chip (NoC)-based Multi-Processor System-on-Chip (MPSoC) such that the total expected energy consumption of all the tasks is minimized, and propose a novel approach. Our approach consists of a scheduling heuristic for constructing a single unified schedule for all the tasks and assigning a frequency to each task and each communication assuming continuous frequencies, an Integer Linear Programming (ILP)-based algorithm and a polynomial time heuristic for assigning discrete frequencies and voltages to tasks and communications. We have performed experiments on 16 synthetic and 4 real-world benchmarks. The experimental results show that compared to the state-of-the-art approach, our approach using the ILP-based algorithm and our approach using the polynomial-time heuristic achieve average improvements of 31% and 20%, respectively, in terms of energy reduction

    Task assignment in parallel processor systems

    Get PDF
    A generic object-oriented simulation platform is developed in order to conduct experiments on the performance of assignment schemes. The simulation platform, called Genesis, is generic in the sense that it can model the key parameters that describe a parallel system: the architecture, the program, the assignment scheme and the message routing strategy. Genesis uses as its basis a sound architectural representation scheme developed in the thesis. The thesis reports results from a number of experiments assessing the performance of assignment schemes using Genesis. The comparison results indicate that the new assignment scheme proposed in this thesis is a promising alternative to the work-greedy assignment schemes. The proposed scheme has a time-complexity less than those of the work-greedy schemes and achieves an average performance better than, or comparable to, those of the work-greedy schemes. To generate an assignment, some parameters describing the program model will be required. In many cases, accurate estimation of these parameters is hard. It is thought that inaccuracies in the estimation would lead to poor assignments. The thesis investigates this speculation and presents experimental evidence that shows such inaccuracies do not greatly affect the quality of the assignments
    corecore