Search CORE

101 research outputs found

Hardware-software codesign in a high-level synthesis environment

Author: Visegrady Tamas L
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/1999
Field of study

Interfacing hardware-oriented high-level synthesis to software development is a computationally hard problem for which no general solution exists. Under special conditions, the hardware-software codesign (system-level synthesis) problem may be analyzed with traditional tools and efficient heuristics. This dissertation introduces a new alternative to the currently used heuristic methods. The new approach combines the results of top-down hardware development with existing basic hardware units (bottom-up libraries) and compiler generation tools. The optimization goal is to maximize operating frequency or minimize cost with reasonable tradeoffs in other properties. The dissertation research provides a unified approach to hardware-software codesign. The improvements over previously existing design methodologies are presented in the frame-work of an academic CAD environment (PIPE). This CAD environment implements a sufficient subset of functions of commercial microelectronics CAD packages. The results may be generalized for other general-purpose algorithms or environments. Reference benchmarks are used to validate the new approach. Most of the well-known benchmarks are based on discrete-time numerical simulations, digital filtering applications, and cryptography (an emerging field in benchmarking). As there is a need for high-performance applications, an additional requirement for this dissertation is to investigate pipelined hardware-software systems\u27 performance and design methods. The results demonstrate that the quality of existing heuristics does not change in the enhanced, hardware-software environment

UNH Scholars' Repository

Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

Author: Benini L.
Loi I.
Pullini A.
Rossi D.
Tagliavini G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper presents Mr. Wolf, a parallel ultra-low power (PULP) system on chip (SoC) featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class RISC-V core augmented with an autonomous IO subsystem for efficient data transfer from a wide set of peripherals. The small core can offload compute-intensive kernels to an eight-core floating-point capable of processing engine available on demand. The proposed SoC, implemented in a 40-nm LP CMOS technology, features a 108-mu W fully retentive memory (512 kB). The IO subsystem is capable of transferring up to 1.6 Gbit/s from external devices to the memory in less than 2.5 mW. The eight-core compute cluster achieves a peak performance of 850 million of 32-bit integer multiply and accumulate per second (MMAC/s) and 500 million of 32-bit floating-point multiply and accumulate per second (MFMAC/s) -1 GFlop/s-with an energy efficiency up to 15 MMAC/s/mW and 9 MFMAC/s/mW. These building blocks are supported by aggressive on-chip power conversion and management, enabling energy-proportional heterogeneous computing for always-on IoT end nodes improving performance by several orders of magnitude with respect to traditional single-core MCUs within a power envelope of 153 mW. We demonstrated the capabilities of the proposed SoC on a wide set of near-sensor processing kernels showing that Mr. Wolf can deliver performance up to 16.4 GOp/s with energy efficiency up to 274 MOp/s/mW on real-life applications, paving the way for always-on data analytics on high-bandwidth sensors at the edge of the Internet of Things

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A Framework for System-level Modeling and Simulation of Embedded Systems Architectures

Author: Erbas C.
Pimentel A.D.
Polstra S.
Thompson M.
Publication venue
Publication date: 01/01/2007
Field of study

International Migration, Integration and Social Cohesion online publications

Enabling VLSI processing blocks for MIMO-OFDM Communications

Author: Barbara Cerato
Emanuele Viterbo
Guido Masera
Publication venue
Publication date: 01/01/2008
Field of study

Multi-input multi-output (MIMO) systems combined with orthogonal frequency-division multiplexing (OFDM) gained a wide popularity in wireless applications due to the potential of providing increased channel capacity and robustness against multipath fading channels. However these advantages come at the cost of a very high processing complexity and the efficient implementation of MIMO-OFDM receivers is today a major research topic. In this paper, efficient architectures are proposed for the hardware implementation of the main building blocks of a MIMO-OFDM receiver. A sphere decoder architecture flexible to different modulation without any loss in BER performance is presented while the proposed matrix factorization implementation allows to achieve the highest throughput specified in the IEEE 802.11n standard. Finally a novel sphere decoder approach is presented, which allows for the realization of new golden space time trellis coded modulation (GST-TCM) scheme. Implementation cost and offered throughput are provided for the proposed architectures synthesized on a 0.13 CMOS standard cell technology or on advanced FPGA devices

Directory of Open Access Journals

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Open Access Repository

System Level Performance Evaluation of Distributed Embedded Systems

Author: Khan Subayal
Publication venue: Tampere University of Technology
Publication date: 01/01/2012
Field of study

In order to evaluate the feasibility of the distributed embedded systems in different application domains at an early phase, the System Level Performance Evaluation (SLPE) must provide reliable estimates of the nonfunctional properties of the system such as end-to-end delays and packet losses rate. The values of these non-functional properties depend not only on the application layer of the OSI model but also on the technologies residing at the MAC, transport and Physical layers. Therefore, the system level performance evaluation methodology must provide functionally accurate models of the protocols and technologies operating at these layers. After conducting a state of the art survey, it was found that the existing approaches for SLPE are either specialized for a particular domain of systems or apply a particular model of computation (MOC) for modeling the communication and synchronization between the different components of a distributed application. Therefore, these approaches abstract the functionalities of the data-link, Transport and MAC layers by the highly abstract message passing methods employed by the different models of computation. On the other hand, network simulators such as OMNeT++, ns-2 and Opnet do not provide the models for platform components of devices such as processors and memories and totally abstract the application processing by delays obtained via traffic generators. Therefore the system designer is not able to determine the potential impact of an application in terms of utilization of the platform used by the device. Hence, for a system level performance evaluation approach to estimate both the platform utilization and the non-functional properties which are a consequence of the lower layers of OSI models (such as end-to-end delays), it must provide the tools for automatic workload extraction of application workload models at various levels of refinement and functionally correct models of lower layers of OSI model (Transport MAC and Physical layers). Since ABSOLUT is not restricted to a particular domain and also does not depend on any MOC, therefore it was selected for the extension to a system level performance evaluation approach for distributed embedded systems. The models of data-link and Transport layer protocols and automatic workload generation of system calls was not available in ABSOLUT performance evaluation methodology. The, thesis describes the design and modelling of these OSI model layers and automatic workload generation tool for system calls. The tools and models integrated to ABSOLUT methodology were used in a number of case studies. The accuracy of the protocols was compared to network simulators and real systems. The results were 88% accurate for user space code of the application layer and provide an improvement of over 50% as compared to manual models for external libraries and system calls. The ABSOLUT physical layer models were found to be 99.8% accurate when compared to analytical models. The MAC and transport layer models were found to be 70-80% accurate when compared with the same scenarios simulated by ns-2 and OMNeT++ simulators. The bit error rates, frame error probability and packet loss rates show close correlation with the analytical methods .i.e., over 99%, 92% and 80% respectively. Therefore the results of ABSOLUT framework for application layer outperform the results of performance evaluation approaches which employ virtual systems and at the same time provide as accurate estimates of the end-to-end delays and packet loss rate as network simulators. The results of the network simulators also vary in absolute values but they follow the same trend. Therefore, the extensions made to ABSOLUT allow the system designer to identify the potential bottlenecks in the system at different OSI model layers and evaluate the non-functional properties with a high level of accuracy. Also, if the system designer wants to focus entirely on the application layer, different models of computations can be easily instantiated on top of extended ABSOLUT framework to achieve higher simulation speeds as described in the thesis

Trepo - Institutional Repository of Tampere University

Dataflow-Based Mapping of Computer Vision Algorithms onto FPGAs

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Dataflow-based mapping of computer vision algorithms onto FPGAs

Author: Bhattacharyya Shuvra S
Corretjer I
Haim Fiorella
Schlessman Jason
Sen Mainak
Silveyra Andrés
Thiehan Lv
Wolf Wayne
Publication venue: Hindawi Publishing Corporation
Publication date
Field of study

We develop a design methodology for mapping computer vision algorithms onto an FPGA through the use of coarse-grain reconfigurable dataflow graphs as a representation to guide the designer. We first describe a new dataflow modeling technique called homogeneous parameterized dataflow (HPDF), which effectively captures the structure of an important class of computer vision applications. This form of dynamic dataflow takes advantage of the property that in a large number of image processing applications, data production and consumption rates can vary, but are equal across dataflow graph edges for any particular application iteration. After motivating and defining the HPDF model of computation, we develop an HPDF-based design methodology that offers useful properties in terms of verifying correctness and exposing performance-enhancing transformations, we discuss and address various challenges in efficiently mapping an HPDF-based application representation into target-specific HDL code, and we present experimental results pertaining to the mapping of a gesture recognition application onto the Xilinx Virtex II FPGA

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

An Exploration of FPGAs as Accelerators for Graph Analysis via High-Level Synthesis

Author: Pedro Filipe Vilhena de Campos Oliveira e Silva
Publication venue
Publication date: 14/01/2021
Field of study

Repositório Aberto da Universidade do Porto

Dynamically reconfigurable bio-inspired hardware

Author: Upegui Posada Andres Emilio
Publication venue: Lausanne, EPFL
Publication date: 22/08/2006
Field of study

During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis

Infoscience - École polytechnique fédérale de Lausanne

WTEC Panel Report on International Assessment of Research and Development in Simulation-Based Engineering and Science

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref