4,558 research outputs found

    Analytical Cost Metrics : Days of Future Past

    Full text link
    As we move towards the exascale era, the new architectures must be capable of running the massive computational problems efficiently. Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems. These problems arise in almost all areas of computing, ranging from big data analytics, artificial intelligence, search, machine learning, virtual/augmented reality, computer vision, image/signal processing to computational science and bioinformatics. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. Therefore, the major challenge that we face in computing systems research is: "how to solve massive-scale computational problems in the most time/power/energy efficient manner?" The architectures are constantly evolving making the current performance optimizing strategies less applicable and new strategies to be invented. The solution is for the new architectures, new programming models, and applications to go forward together. Doing this is, however, extremely hard. There are too many design choices in too many dimensions. We propose the following strategy to solve the problem: (i) Models - Develop accurate analytical models (e.g. execution time, energy, silicon area) to predict the cost of executing a given program, and (ii) Complete System Design - Simultaneously optimize all the cost models for the programs (computational problems) to obtain the most time/area/power/energy efficient solution. Such an optimization problem evokes the notion of codesign

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges

    Full text link
    With the emerging big data applications of Machine Learning, Speech Recognition, Artificial Intelligence, and DNA Sequencing in recent years, computer architecture research communities are facing the explosive scale of various data explosion. To achieve high efficiency of data-intensive computing, studies of heterogeneous accelerators which focus on latest applications, have become a hot issue in computer architecture domain. At present, the implementation of heterogeneous accelerators mainly relies on heterogeneous computing units such as Application-specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among the typical heterogeneous architectures above, FPGA-based reconfigurable accelerators have two merits as follows: First, FPGA architecture contains a large number of reconfigurable circuits, which satisfy requirements of high performance and low power consumption when specific applications are running. Second, the reconfigurable architectures of employing FPGA performs prototype systems rapidly and features excellent customizability and reconfigurability. Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures. To better review the related work of reconfigurable computing accelerators recently, this survey reserves latest high-level research products of reconfigurable accelerator architectures and algorithm applications as the basis. In this survey, we compare hot research issues and concern domains, furthermore, analyze and illuminate advantages, disadvantages, and challenges of reconfigurable accelerators. In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers

    A Quick Introduction to Functional Verification of Array-Intensive Programs

    Full text link
    Array-intensive programs are often amenable to parallelization across many cores on a single machine as well as scaling across multiple machines and hence are well explored, especially in the domain of high-performance computing. These programs typically undergo loop transformations and arithmetic transformations in addition to parallelizing transformations. Although a lot of effort has been invested in improving parallelizing compilers, experienced programmers still resort to hand-optimized transformations which is typically followed by careful tuning of the transformed program to finally obtain the optimized program. Therefore, it is critical to verify that the functional correctness of an original sequential program is not sacrificed during the process of optimization. In this paper, we cover important literature on functional verification of array-intensive programs which we believe can be a good starting point for one interested in this field

    Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms

    Full text link
    When a computational task tolerates a relaxation of its specification or when an algorithm tolerates the effects of noise in its execution, hardware, programming languages, and system software can trade deviations from correct behavior for lower resource usage. We present, for the first time, a synthesis of research results on computing systems that only make as many errors as their users can tolerate, from across the disciplines of computer aided design of circuits, digital system design, computer architecture, programming languages, operating systems, and information theory. Rather than over-provisioning resources at each layer to avoid errors, it can be more efficient to exploit the masking of errors occurring at one layer which can prevent them from propagating to a higher layer. We survey tradeoffs for individual layers of computing systems from the circuit level to the operating system level and illustrate the potential benefits of end-to-end approaches using two illustrative examples. To tie together the survey, we present a consistent formalization of terminology, across the layers, which does not significantly deviate from the terminology traditionally used by research communities in their layer of focus.Comment: 35 page

    Collective Tuning Initiative

    Full text link
    Computing systems rarely deliver best possible performance due to ever increasing hardware and software complexity and limitations of the current optimization technology. Additional code and architecture optimizations are often required to improve execution time, size, power consumption, reliability and other important characteristics of computing systems. However, it is often a tedious, repetitive, isolated and time consuming process. In order to automate, simplify and systematize program optimization and architecture design, we are developing open-source modular plugin-based Collective Tuning Infrastructure (CTI, http://cTuning.org) that can distribute optimization process and leverage optimization experience of multiple users. CTI provides a novel fully integrated, collaborative, "one button" approach to improve existing underperfoming computing systems ranging from embedded architectures to high-performance servers based on systematic iterative compilation, statistical collective optimization and machine learning. Our experimental results show that it is possible to reduce execution time (and code size) of some programs from SPEC2006 and EEMBC among others by more than a factor of 2 automatically. It can also reduce development and testing time considerably. Together with the first production quality machine learning enabled interactive research compiler (MILEPOST GCC) this infrastructure opens up many research opportunities to study and develop future realistic self-tuning and self-organizing adaptive intelligent computing systems based on systematic statistical performance evaluation and benchmarking. Finally, using common optimization repository is intended to improve the quality and reproducibility of the research on architecture and code optimization.Comment: GCC Developers' Summit'09, 14 June 2009, Montreal, Canad

    Particle Swarm Optimization for HW/SW Partitioning

    Get PDF

    Parallel Programming Models for Heterogeneous Many-Cores : A Survey

    Full text link
    Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability. We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices. We provide a road map for a wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This article provides both an accessible introduction to the fast-moving area of heterogeneous programming and a detailed bibliography of its main achievements.Comment: Accepted to be published at CCF Transactions on High Performance Computin

    Activity Report: Automatic Control 2012

    Get PDF

    Cities of the Future: Employing Wireless Sensor Networks for Efficient Decision Making in Complex Environments

    Full text link
    Decision making in large scale urban environments is critical for many applications involving continuous distribution of resources and utilization of infrastructure, such as ambient lighting control and traffic management. Traditional decision making methods involve extensive human participation, are expensive, and inefficient and unreliable for hard-to-predict situations. Modern technology, including ubiquitous data collection though sensors, automated analysis and prognosis, and online optimization, offers new capabilities for developing flexible, autonomous, scalable, efficient, and predictable control methods. This paper presents a new decision making concept in which a hierarchy of semantically more abstract models are utilized to perform online scalable and predictable control. The lower semantic levels perform localized decisions based on sampled data from the environment, while the higher semantic levels provide more global, time invariant results based on aggregated data from the lower levels. There is a continuous feedback between the levels of the semantic hierarchy, in which the upper levels set performance guaranteeing constraints for the lower levels, while the lower levels indicate whether these constraints are feasible or not. Even though the semantic hierarchy is not tied to a particular set of description models, the paper illustrates a hierarchy used for traffic management applications and composed of Finite State Machines, Conditional Task Graphs, Markov Decision Processes, and functional graphs. The paper also summarizes some of the main research problems that must be addressed as part of the proposed concep
    corecore