4,558 research outputs found
Analytical Cost Metrics : Days of Future Past
As we move towards the exascale era, the new architectures must be capable of
running the massive computational problems efficiently. Scientists and
researchers are continuously investing in tuning the performance of
extreme-scale computational problems. These problems arise in almost all areas
of computing, ranging from big data analytics, artificial intelligence, search,
machine learning, virtual/augmented reality, computer vision, image/signal
processing to computational science and bioinformatics. With Moore's law
driving the evolution of hardware platforms towards exascale, the dominant
performance metric (time efficiency) has now expanded to also incorporate
power/energy efficiency. Therefore, the major challenge that we face in
computing systems research is: "how to solve massive-scale computational
problems in the most time/power/energy efficient manner?"
The architectures are constantly evolving making the current performance
optimizing strategies less applicable and new strategies to be invented. The
solution is for the new architectures, new programming models, and applications
to go forward together. Doing this is, however, extremely hard. There are too
many design choices in too many dimensions. We propose the following strategy
to solve the problem: (i) Models - Develop accurate analytical models (e.g.
execution time, energy, silicon area) to predict the cost of executing a given
program, and (ii) Complete System Design - Simultaneously optimize all the cost
models for the programs (computational problems) to obtain the most
time/area/power/energy efficient solution. Such an optimization problem evokes
the notion of codesign
Coarse-grained reconfigurable array architectures
Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges
With the emerging big data applications of Machine Learning, Speech
Recognition, Artificial Intelligence, and DNA Sequencing in recent years,
computer architecture research communities are facing the explosive scale of
various data explosion. To achieve high efficiency of data-intensive computing,
studies of heterogeneous accelerators which focus on latest applications, have
become a hot issue in computer architecture domain. At present, the
implementation of heterogeneous accelerators mainly relies on heterogeneous
computing units such as Application-specific Integrated Circuit (ASIC),
Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among
the typical heterogeneous architectures above, FPGA-based reconfigurable
accelerators have two merits as follows: First, FPGA architecture contains a
large number of reconfigurable circuits, which satisfy requirements of high
performance and low power consumption when specific applications are running.
Second, the reconfigurable architectures of employing FPGA performs prototype
systems rapidly and features excellent customizability and reconfigurability.
Nowadays, in top-tier conferences of computer architecture, emerging a batch of
accelerating works based on FPGA or other reconfigurable architectures. To
better review the related work of reconfigurable computing accelerators
recently, this survey reserves latest high-level research products of
reconfigurable accelerator architectures and algorithm applications as the
basis. In this survey, we compare hot research issues and concern domains,
furthermore, analyze and illuminate advantages, disadvantages, and challenges
of reconfigurable accelerators. In the end, we prospect the development
tendency of accelerator architectures in the future, hoping to provide a
reference for computer architecture researchers
A Quick Introduction to Functional Verification of Array-Intensive Programs
Array-intensive programs are often amenable to parallelization across many
cores on a single machine as well as scaling across multiple machines and hence
are well explored, especially in the domain of high-performance computing.
These programs typically undergo loop transformations and arithmetic
transformations in addition to parallelizing transformations. Although a lot of
effort has been invested in improving parallelizing compilers, experienced
programmers still resort to hand-optimized transformations which is typically
followed by careful tuning of the transformed program to finally obtain the
optimized program. Therefore, it is critical to verify that the functional
correctness of an original sequential program is not sacrificed during the
process of optimization. In this paper, we cover important literature on
functional verification of array-intensive programs which we believe can be a
good starting point for one interested in this field
Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms
When a computational task tolerates a relaxation of its specification or when
an algorithm tolerates the effects of noise in its execution, hardware,
programming languages, and system software can trade deviations from correct
behavior for lower resource usage. We present, for the first time, a synthesis
of research results on computing systems that only make as many errors as their
users can tolerate, from across the disciplines of computer aided design of
circuits, digital system design, computer architecture, programming languages,
operating systems, and information theory.
Rather than over-provisioning resources at each layer to avoid errors, it can
be more efficient to exploit the masking of errors occurring at one layer which
can prevent them from propagating to a higher layer. We survey tradeoffs for
individual layers of computing systems from the circuit level to the operating
system level and illustrate the potential benefits of end-to-end approaches
using two illustrative examples. To tie together the survey, we present a
consistent formalization of terminology, across the layers, which does not
significantly deviate from the terminology traditionally used by research
communities in their layer of focus.Comment: 35 page
Collective Tuning Initiative
Computing systems rarely deliver best possible performance due to ever
increasing hardware and software complexity and limitations of the current
optimization technology. Additional code and architecture optimizations are
often required to improve execution time, size, power consumption, reliability
and other important characteristics of computing systems. However, it is often
a tedious, repetitive, isolated and time consuming process. In order to
automate, simplify and systematize program optimization and architecture
design, we are developing open-source modular plugin-based Collective Tuning
Infrastructure (CTI, http://cTuning.org) that can distribute optimization
process and leverage optimization experience of multiple users. CTI provides a
novel fully integrated, collaborative, "one button" approach to improve
existing underperfoming computing systems ranging from embedded architectures
to high-performance servers based on systematic iterative compilation,
statistical collective optimization and machine learning. Our experimental
results show that it is possible to reduce execution time (and code size) of
some programs from SPEC2006 and EEMBC among others by more than a factor of 2
automatically. It can also reduce development and testing time considerably.
Together with the first production quality machine learning enabled interactive
research compiler (MILEPOST GCC) this infrastructure opens up many research
opportunities to study and develop future realistic self-tuning and
self-organizing adaptive intelligent computing systems based on systematic
statistical performance evaluation and benchmarking. Finally, using common
optimization repository is intended to improve the quality and reproducibility
of the research on architecture and code optimization.Comment: GCC Developers' Summit'09, 14 June 2009, Montreal, Canad
Parallel Programming Models for Heterogeneous Many-Cores : A Survey
Heterogeneous many-cores are now an integral part of modern computing systems
ranging from embedding systems to supercomputers. While heterogeneous many-core
design offers the potential for energy-efficient high-performance, such
potential can only be unlocked if the application programs are suitably
parallel and can be made to match the underlying heterogeneous platform. In
this article, we provide a comprehensive survey for parallel programming models
for heterogeneous many-core architectures and review the compiling techniques
of improving programmability and portability. We examine various software
optimization techniques for minimizing the communicating overhead between
heterogeneous computing devices. We provide a road map for a wide variety of
different research areas. We conclude with a discussion on open issues in the
area and potential research directions. This article provides both an
accessible introduction to the fast-moving area of heterogeneous programming
and a detailed bibliography of its main achievements.Comment: Accepted to be published at CCF Transactions on High Performance
Computin
Cities of the Future: Employing Wireless Sensor Networks for Efficient Decision Making in Complex Environments
Decision making in large scale urban environments is critical for many
applications involving continuous distribution of resources and utilization of
infrastructure, such as ambient lighting control and traffic management.
Traditional decision making methods involve extensive human participation, are
expensive, and inefficient and unreliable for hard-to-predict situations.
Modern technology, including ubiquitous data collection though sensors,
automated analysis and prognosis, and online optimization, offers new
capabilities for developing flexible, autonomous, scalable, efficient, and
predictable control methods. This paper presents a new decision making concept
in which a hierarchy of semantically more abstract models are utilized to
perform online scalable and predictable control. The lower semantic levels
perform localized decisions based on sampled data from the environment, while
the higher semantic levels provide more global, time invariant results based on
aggregated data from the lower levels. There is a continuous feedback between
the levels of the semantic hierarchy, in which the upper levels set performance
guaranteeing constraints for the lower levels, while the lower levels indicate
whether these constraints are feasible or not. Even though the semantic
hierarchy is not tied to a particular set of description models, the paper
illustrates a hierarchy used for traffic management applications and composed
of Finite State Machines, Conditional Task Graphs, Markov Decision Processes,
and functional graphs. The paper also summarizes some of the main research
problems that must be addressed as part of the proposed concep
- …