Search CORE

768 research outputs found

Agile SoC Development with Open ESP

Author: Balkind J.
Carloni L. P.
Carloni L. P.
Carloni L. P.
Caulfield A. M.
Chishiro H.
Cota E. G.
Gaisler J.
Giri D.
Giri D.
Giri D.
Giri D.
Khailany B.
Kurth A.
Liu H-Y.
Liu H-Y.
Mantovani P.
Mantovani P.
Mantovani P.
Meredith M.
Pilato C.
Rossi D.
Yoon Y.-J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/09/2020
Field of study

ESP is an open-source research platform for heterogeneous SoC design. The platform combines a modular tile-based architecture with a variety of application-oriented flows for the design and optimization of accelerators. The ESP architecture is highly scalable and strikes a balance between regularity and specialization. The companion methodology raises the level of abstraction to system-level design and enables an automated flow from software and hardware development to full-system prototyping on FPGA. For application developers, ESP offers domain-specific automated solutions to synthesize new accelerators for their software and to map complex workloads onto the SoC architecture. For hardware engineers, ESP offers automated solutions to integrate their accelerator designs into the complete SoC. Conceived as a heterogeneous integration platform and tested through years of teaching at Columbia University, ESP supports the open-source hardware community by providing a flexible platform for agile SoC development.Comment: Invited Paper at the 2020 International Conference On Computer Aided Design (ICCAD) - Special Session on Opensource Tools and Platforms for Agile Development of Specialized Architecture

arXiv.org e-Print Archive

Crossref

A Big Data Analyzer for Large Trace Logs

Author: Babaoglu Ozalp
Balliu Alkida
Marzolla Moreno
Olivetti Dennis
Sîrbu Alina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/09/2015
Field of study

Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents BiDAl, a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center.Comment: 26 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Brain-inspired computing needs a master plan

Author: Kenyon Anthony
Mehonic A
Publication venue: Nature Publishing Group
Publication date: 14/04/2022
Field of study

New computing technologies inspired by the brain promise fundamentally different ways to process information with extreme energy efficiency and the ability to handle the avalanche of unstructured and noisy data that we are generating at an ever-increasing rate. To realize this promise requires a brave and coordinated plan to bring together disparate research communities and to provide them with the funding, focus and support needed. We have done this in the past with digital technologies; we are in the process of doing it with quantum technologies; can we now do it for brain-inspired computing

UCL Discovery

Recommended from our members

Hybrid Analog-Digital Co-Processing for Scientific Computation

Author: Huang Yipeng
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms. This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking. I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer. The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around. The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations, and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads. The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms. The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation. As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads

Columbia University Academic Commons

Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware

Author: Harris Steven
Publication venue: Washington University Open Scholarship
Publication date: 01/08/2020
Field of study

The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems

Washington University St. Louis: Open Scholarship

GMEM: Generalized Memory Management for Peripheral Devices

Author: Cox Alan L.
Rixner Scott
Zhu Weixi
Publication venue
Publication date: 19/10/2023
Field of study

This paper presents GMEM, generalized memory management, for peripheral devices. GMEM provides OS support for centralized memory management of both CPU and devices. GMEM provides a high-level interface that decouples MMU-specific functions. Device drivers can thus attach themselves to a process's address space and let the OS take charge of their memory management. This eliminates the need for device drivers to "reinvent the wheel" and allows them to benefit from general memory optimizations integrated by GMEM. Furthermore, GMEM internally coordinates all attached devices within each virtual address space. This drastically improves user-level programmability, since programmers can use a single address space within their program, even when operating across the CPU and multiple devices. A case study on device drivers demonstrates these benefits. A GMEM-based IOMMU driver eliminates around seven hundred lines of code and obtains 54% higher network receive throughput utilizing 32% less CPU compared to the state-of-the-art. In addition, the GMEM-based driver of a simulated GPU takes less than 70 lines of code, excluding its MMU functions.Comment: Finished before Weixi left Rice and submitted to ASPLOS'2

arXiv.org e-Print Archive

Developing a support for FPGAs in the Controller parallel programming model

Author: Rodríguez Canal Gabriel
Publication venue
Publication date: 01/01/2020
Field of study

La computación heterogénea se presenta como la solución para conseguir supercomputadores cada vez más rápidos capaces de resolver problemas más grandes y complejos en diferentes áreas de conocimiento. Para ello, integra aceleradores con distintas arquitecturas capaces de explotar las características de los problemas desde distintos enfoques obteniendo, de este modo, un mayor rendimiento. Las FPGAs son hardware reconfigurable, i.e., es posible modificarlas después de su fabricación. Esto permite una gran flexibilidad y una máxima adaptación al problema en cuestión. Además, tienen un consumo energético muy bajo. Todas estas ventajas tienen el gran inconveniente de una más difícil programaci ón mediante los propensos a errores HDLs (Hardware Description Language), tales como Verilog o VHDL, y requisitos de conocimientos avanzados de electrónica digital. En los últimos años los principales fabricantes de FPGAs han enfocado sus esfuerzos en desarrollar herramientas HLS (High Level Synthesis) que permiten programarlas a través de lenguajes de programación de alto nivel estilo C. Esto ha favorecido su adopción por la comunidad HPC y su integración en los nuevos supercomputadores. Sin embargo, el programador aún tiene que ocuparse de aspectos como la gestión de colas de comandos, parámetros de lanzamiento o transferencias de datos. El modelo Controller es una librería que facilita la gestión de la coordinación, comunicación y los detalles de lanzamiento de los kernels en aceleradores hardware. Explota de forma transparente sus modelos de programación nativos, en concreto OpenCL y CUDA, y, por tanto, consigue un alto rendimiento independientemente del compilador. Permite al programador utilizar los distintos recursos hardware disponibles de forma combinada en entornos heterogéneos. Este trabajo extiende el modelo Controller mediante el desarrollo de un backend que permite la integración de FPGAs, manteniendo los cambios sobre la interfaz de usuario al mínimo. A través de los resultados experimentales se comprueba que se consigue una disminución del esfuerzo de programación significativa en comparación con la implementación nativa en OpenCL. Del mismo modo, se consigue un elevado solapamiento entre computación y comunicación y un sobrecoste por el uso de la librería despreciable.Heterogeneous computing appears to be the solution to achieve ever faster computers capable of solving bigger and more complex problems in difierent fields of knowledge. To that end, it integrates accelerators with difierent architectures capable of exploiting the features of problems from difierent perspectives thus achieving higher performance. FPGAs are reconfigurable hardware, i.e., it is possible to modify them after manufacture. This allows great flexibility and maximum adaptability to the given problem. In addition, they have low power consumption. All these advantages have the great objection of more dificult programming with the errorprone HDLs (Hardware Description Language), such as Verilog or VHDL, and the requirement of advanced knowledge of digital electronics. The main FPGA vendors have concentrated on developing HLS (High Level Synthesis) tools that allow to program them with C-like high level programming languages. This favoured their adoption by the HPC community and their integration in new supercomputers. However, the programmer still has to take care of aspects such as management of command queues, launching parameters or data transfers. The Controller model is a library to easily manage the coordination, communication and kernel launching details on hardware accelerators. It transparently exploits their native or vendor specific programming models, namely OpenCL and CUDA, thus enabling the potential performance obtained by using them in a compiler agnostic way. It is intended to enable the programmer to make use of the diferent available hardware resources in combination in heterogeneous environments. This work extends the Controller model through the development of a backend that allows the integration of FPGAs, keeping the changes over the user-facing interface to the minimum. The experimental results validate that a significant decrease in programming effort compared to the native OpenCL implementation is achieved. Similarly, high overlap of computation and communication and a negligible overhead due to the use of the library are attained.Grado en Ingeniería Informátic

Repositorio Documental de la Universidad de Valladolid