16 research outputs found

    Coupling Routing Algorithm and Data Encoding for Low Power Networks on Chip

    Get PDF
    The routing algorithm used in a Network-on-Chip (NoC) has a strong impact on both the functional and non functional indices of the overall system. Traditionally, routing algorithms have been designed considering performance and cost as the main objectives. In this study we focus on two important non functional metrics, namely, power dissipation and energy consumption. We propose a selection policy that can be coupled with any multi-path routing function and whose primary goal is reducing power dissipation. As technology shrinks, the power dissipated by the network links represents an ever more significant fraction of the total power budget. Based on this, the proposed selection policy tries to reduce link power dissipation by selecting the output port of the router which minimises the switching activity of the output link. A set of experiments carried out on both synthetic and real traffic scenarios is presented. When the proposed selection policy is used in conjunction with a data encoding technique, on average, 31% of energy reduction and 37% of power saving is observed. An architectural implementation of the selection policy is also presented and its impact on cost (silicon area) and power dissipation of the baseline router is discussed

    A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

    Full text link
    In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms

    A Multi-objective Genetic Approach to Mapping Problem on Network-on-Chip

    No full text
    Advances in technology now make it possible to integrate hundreds of cores (e.g. general or special purpose processors, embedded memories, application specific components, mixed-signal I/O cores) in a single silicon die. The large number of resources that have to communicate makes the use of interconnection systems based on shared buses inefficient. One way to solve the problem of on-chip communications is to use a Network-on-Chip (NoC)-based communication infrastructure. Such interconnection systems offer new degrees of freedom, exploration of which may reveal significant optimization possibilities: the possibility of arranging the computing and storage resources in an NoC, for example, has a great impact on various performance indexes. The paper addresses the problem of topological mapping of intellectual properties (IPs) on the tiles of a mesh-based NoC architecture. The aim is to obtain the Pareto mappings that maximize performance and minimize power dissipation. We propose a heuristic technique based on evolutionary computing to obtain an optimal approximation of the Pareto-optimal front in an efficient and accurate way. At the same time, two of the most widely-known approaches to mapping in mesh­based NoC architectures are extended in order to explore the mapping space in a multi-criteria mode. The approaches are then evaluated and compared, in terms of both accuracy and efficiency, on a platform based on an event-driven trace-based simulator which makes it possible to take account of important dynamic effects that have a great impact on mapping. The evaluation performed on both synthesized traffic and real applications (an MPEG-4 codec) confirms the efficiency, accuracy and scalability of the proposed approach

    An Instruction-Level Power Analysis Model with Data Dependency

    No full text
    Power constraints are becoming a critical design issue in the field of portable microprocessor systems. The impact of software on overall system power is becomin

    Efficient design space exploration for application specific systems-on-a-chip

    No full text
    A reduction in the time-to-market has led to widespread use of pre-designed parametric architectural solutions known as system-on-a-chip (SoC) platforms. A system designer has to configure the platform in such a way as to optimize it for the execution of a specific application. Very frequently, however, the space of possible configurations that can be mapped onto a SoC platform is huge and the computational effort needed to evaluate a single system configuration can be very costly. In this paper we propose an approach which tackles the problem of design space exploration (DSE) in both of the fronts of the reduction of the number of system configurations to be simulated and the reduction of the time required to evaluate (i.e., simulate) a system configuration. More precisely, we propose the use of Multi-objective Evolutionary Algorithms as optimization technique and Fuzzy Systems for the estimation of the performance indexes to be optimized. The proposed approach is applied on a highly parameterized SoC platform based on a parameterized VLIW processor and a parameterized memory hierarchy for the optimization of performance and power dissipation. The approach is evaluated in terms of both accuracy and efficiency and compared with several established DSE approaches. The results obtained for a set of multimedia applications show an improvement in both accuracy and exploration time

    Fuzzy decision making in embedded system design

    No full text
    The use of Application Specific Instruction-set Processors (ASIP) is a solution to the problem of increasing complexity in embedded systems design. One of the major challenges in ASIP design is Design Space Exploration (DSE), because of the heterogeneity of the objectives and parameters involved. Typically DSE is a multi- objective search problem, where performance, power, area, etc. are the different optimization criteria. The output of a DSE strategy is a set of candidate design solutions called a Pareto-optimal set. Choosing a solution for system implementation from the Pareto- optimal set can be a difficult task, generally because Pareto-optimal sets can be extremely large or even contain an infinite number of solutions. In this paper we propose a methodology to assist the decision-maker in analysis of the solutions to multi-objective problems. By means of fuzzy clustering techniques, it finds the reduced Pareto subset, which best represents all the Pareto solutions. This optimal subset will be used for further and more accurate (but slower) analysis. As a real application example we address the optimization of area, performance, and power of a VLIW-based embedded system

    Performance evaluation of efficient multi-objective evolutionary algorithms for design space exploration of embedded computer systems

    No full text
    Multi-objective evolutionary algorithms (MOEAs) have received increasing interest in industry because they have proved to be powerful optimizers. Despite the great success achieved, however, MOEAs have also encountered many challenges in real-world applications. One of the main difficulties in applying MOEAs is the large number of fitness evaluations (objective calculations) that are often needed before an acceptable solution can be found. There are, in fact, several industrial situations in which fitness evaluations are computationally expensive and the time available is very short. In these applications efficient strategies to approximate the fitness function have to be adopted, looking for a trade-off between optimization performance and efficiency. This is the case in designing a complex embedded system, where it is necessary to define an optimal architecture in relation to certain performance indexes while respecting strict time-to-market constraints. This activity, known as design space exploration (DSE), is still a great challenge for the EDA (electronic design automation) community. One of the most important bottlenecks in the overall design flow of an embedded system is due to simulation. Simulation occurs at every phase of the design flow and is used to evaluate a system which is a candidate for implementation. In this paper we focus on system level design, proposing an extensive comparison of the state-of-the-art of MOEA approaches with an approach based on fuzzy approximation to speed up the evaluation of a candidate system configuration. The comparison is performed in a real case study: optimization of the performance and power dissipation of embedded architectures based on a Very Long Instruction Word (VLIW) microprocessor in a mobile multimedia application domain. The results of the comparison demonstrate that the fuzzy approach outperforms in terms of both performance and efficiency the state of the art in MOEA strategies applied to DSE of a parameterized embedded system
    corecore