768 research outputs found
Agile SoC Development with Open ESP
ESP is an open-source research platform for heterogeneous SoC design. The
platform combines a modular tile-based architecture with a variety of
application-oriented flows for the design and optimization of accelerators. The
ESP architecture is highly scalable and strikes a balance between regularity
and specialization. The companion methodology raises the level of abstraction
to system-level design and enables an automated flow from software and hardware
development to full-system prototyping on FPGA. For application developers, ESP
offers domain-specific automated solutions to synthesize new accelerators for
their software and to map complex workloads onto the SoC architecture. For
hardware engineers, ESP offers automated solutions to integrate their
accelerator designs into the complete SoC. Conceived as a heterogeneous
integration platform and tested through years of teaching at Columbia
University, ESP supports the open-source hardware community by providing a
flexible platform for agile SoC development.Comment: Invited Paper at the 2020 International Conference On Computer Aided
Design (ICCAD) - Special Session on Opensource Tools and Platforms for Agile
Development of Specialized Architecture
A Big Data Analyzer for Large Trace Logs
Current generation of Internet-based services are typically hosted on large
data centers that take the form of warehouse-size structures housing tens of
thousands of servers. Continued availability of a modern data center is the
result of a complex orchestration among many internal and external actors
including computing hardware, multiple layers of intricate software, networking
and storage devices, electrical power and cooling plants. During the course of
their operation, many of these components produce large amounts of data in the
form of event and error logs that are essential not only for identifying and
resolving problems but also for improving data center efficiency and
management. Most of these activities would benefit significantly from data
analytics techniques to exploit hidden statistical patterns and correlations
that may be present in the data. The sheer volume of data to be analyzed makes
uncovering these correlations and patterns a challenging task. This paper
presents BiDAl, a prototype Java tool for log-data analysis that incorporates
several Big Data technologies in order to simplify the task of extracting
information from data traces produced by large clusters and server farms. BiDAl
provides the user with several analysis languages (SQL, R and Hadoop MapReduce)
and storage backends (HDFS and SQLite) that can be freely mixed and matched so
that a custom tool for a specific task can be easily constructed. BiDAl has a
modular architecture so that it can be extended with other backends and
analysis languages in the future. In this paper we present the design of BiDAl
and describe our experience using it to analyze publicly-available traces from
Google data clusters, with the goal of building a realistic model of a complex
data center.Comment: 26 pages, 10 figure
Brain-inspired computing needs a master plan
New computing technologies inspired by the brain promise fundamentally different ways to process information with extreme energy efficiency and the ability to handle the avalanche of unstructured and noisy data that we are generating at an ever-increasing rate. To realize this promise requires a brave and coordinated plan to bring together disparate research communities and to provide them with the funding, focus and support needed. We have done this in the past with digital technologies; we are in the process of doing it with quantum technologies; can we now do it for brain-inspired computing
Recommended from our members
Hybrid Analog-Digital Co-Processing for Scientific Computation
In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms.
This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking.
I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer.
The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around.
The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations,
and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads.
The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms.
The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation.
As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads
Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware
The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems
GMEM: Generalized Memory Management for Peripheral Devices
This paper presents GMEM, generalized memory management, for peripheral
devices. GMEM provides OS support for centralized memory management of both CPU
and devices. GMEM provides a high-level interface that decouples MMU-specific
functions. Device drivers can thus attach themselves to a process's address
space and let the OS take charge of their memory management. This eliminates
the need for device drivers to "reinvent the wheel" and allows them to benefit
from general memory optimizations integrated by GMEM. Furthermore, GMEM
internally coordinates all attached devices within each virtual address space.
This drastically improves user-level programmability, since programmers can use
a single address space within their program, even when operating across the CPU
and multiple devices. A case study on device drivers demonstrates these
benefits. A GMEM-based IOMMU driver eliminates around seven hundred lines of
code and obtains 54% higher network receive throughput utilizing 32% less CPU
compared to the state-of-the-art. In addition, the GMEM-based driver of a
simulated GPU takes less than 70 lines of code, excluding its MMU functions.Comment: Finished before Weixi left Rice and submitted to ASPLOS'2
Developing a support for FPGAs in the Controller parallel programming model
La computación heterogénea se presenta como la solución para conseguir supercomputadores cada vez
más rápidos capaces de resolver problemas más grandes y complejos en diferentes áreas de conocimiento.
Para ello, integra aceleradores con distintas arquitecturas capaces de explotar las características de los
problemas desde distintos enfoques obteniendo, de este modo, un mayor rendimiento.
Las FPGAs son hardware reconfigurable, i.e., es posible modificarlas después de su fabricación. Esto
permite una gran flexibilidad y una máxima adaptación al problema en cuestión. Además, tienen un
consumo energético muy bajo. Todas estas ventajas tienen el gran inconveniente de una más difícil programaci
ón mediante los propensos a errores HDLs (Hardware Description Language), tales como Verilog o
VHDL, y requisitos de conocimientos avanzados de electrónica digital. En los últimos años los principales
fabricantes de FPGAs han enfocado sus esfuerzos en desarrollar herramientas HLS (High Level Synthesis)
que permiten programarlas a través de lenguajes de programación de alto nivel estilo C. Esto ha favorecido
su adopción por la comunidad HPC y su integración en los nuevos supercomputadores. Sin embargo, el
programador aún tiene que ocuparse de aspectos como la gestión de colas de comandos, parámetros de
lanzamiento o transferencias de datos.
El modelo Controller es una librería que facilita la gestión de la coordinación, comunicación y los
detalles de lanzamiento de los kernels en aceleradores hardware. Explota de forma transparente sus modelos
de programación nativos, en concreto OpenCL y CUDA, y, por tanto, consigue un alto rendimiento
independientemente del compilador. Permite al programador utilizar los distintos recursos hardware
disponibles de forma combinada en entornos heterogéneos.
Este trabajo extiende el modelo Controller mediante el desarrollo de un backend que permite la
integración de FPGAs, manteniendo los cambios sobre la interfaz de usuario al mínimo. A través de los
resultados experimentales se comprueba que se consigue una disminución del esfuerzo de programación
significativa en comparación con la implementación nativa en OpenCL. Del mismo modo, se consigue
un elevado solapamiento entre computación y comunicación y un sobrecoste por el uso de la librería
despreciable.Heterogeneous computing appears to be the solution to achieve ever faster computers capable of solving
bigger and more complex problems in difierent fields of knowledge. To that end, it integrates accelerators
with difierent architectures capable of exploiting the features of problems from difierent perspectives thus
achieving higher performance.
FPGAs are reconfigurable hardware, i.e., it is possible to modify them after manufacture. This allows
great flexibility and maximum adaptability to the given problem. In addition, they have low power
consumption. All these advantages have the great objection of more dificult programming with the errorprone
HDLs (Hardware Description Language), such as Verilog or VHDL, and the requirement of advanced
knowledge of digital electronics. The main FPGA vendors have concentrated on developing HLS (High
Level Synthesis) tools that allow to program them with C-like high level programming languages. This
favoured their adoption by the HPC community and their integration in new supercomputers. However,
the programmer still has to take care of aspects such as management of command queues, launching
parameters or data transfers.
The Controller model is a library to easily manage the coordination, communication and kernel launching
details on hardware accelerators. It transparently exploits their native or vendor specific programming
models, namely OpenCL and CUDA, thus enabling the potential performance obtained by using them in
a compiler agnostic way. It is intended to enable the programmer to make use of the diferent available
hardware resources in combination in heterogeneous environments.
This work extends the Controller model through the development of a backend that allows the integration
of FPGAs, keeping the changes over the user-facing interface to the minimum. The experimental
results validate that a significant decrease in programming effort compared to the native OpenCL implementation
is achieved. Similarly, high overlap of computation and communication and a negligible
overhead due to the use of the library are attained.Grado en Ingeniería Informátic
- …