Search CORE

732 research outputs found

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH

Author: Brodersen R
So HKH
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Fulltext linkThis paper explores the design and implementation of BORPH, an operating system designed for FPGA-based reconfigurable computers. Hardware designs execute as normal UNIX processes under BORPH, having access to standard OS services, such as file system support. Hardware and software components of user designs may, therefore, run as communicating processes within BORPH's runtime environment. The familiar language independent UNIX kernel interface facilitates easy design reuse and rapid application development. To develop hardware designs, a Simulink-based design flow that integrates with BORPH is employed. Performances of BORPH on two on-chip systems implemented on a BEE2 platform are compared. © 2008 ACM.link_to_subscribed_fulltex

HKU Scholars Hub

Impact and Challenges of Software in 2025: Collected Papers

Author: Furrer Frank J.
Reimann Jan
Publication venue: Technische Universität Dresden
Publication date: 22/09/2014
Field of study

Today (2014), software is the key ingredient of most products and services. Software generates innovation and progress in many modern industries. Software is an indispensable element of evolution, of quality of life, and of our future. Software development is (slowly) evolving from a craft to an industrial discipline. Software – and the ability to efficiently produce and evolve high-quality software – is the single most important success factor for many highly competitive industries. Software technology, development methods and tools, and applications in more and more areas are rapidly evolving. The impact of software in 2025 in nearly all areas of life, work, relationships, culture, and society is expected to be massive. The question of the future of software is therefore important. However – like all predictions – quite difficult. Some market forces, industrial developments, social needs, and technology trends are visible today. How will they develop and influence the software we will have in 2025?:Impact of Heterogeneous Processor Architectures and Adaptation Technologies on the Software of 2025 (Kay Bierzynski) 9 Facing Future Software Engineering Challenges by Means of Software Product Lines (David Gollasch) 19 Capabilities of Digital Search and Impact on Work and Life in 2025 (Christina Korger) 27 Transparent Components for Software Systems (Paul Peschel) 37 Functionality, Threats and Influence of Ubiquitous Personal Assistants with Regard to the Society (Jonas Rausch) 47 Evolution-driven Changes of Non-Functional Requirements and Their Architecture (Hendrik Schön) 5

Technische Universität Dresden: Qucosa

Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs

Author: Kadetotad Deepak
Kim Minkyu
Linares Barranco Alejandro
Ríos Navarro José Antonio
Seo Jae-Sun
Tapiador Morales Ricardo
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times

idUS. Depósito de Investigación Universidad de Sevilla

Can my chip behave like my brain?

Author: George Suma
Publication venue: Georgia Institute of Technology
Publication date: 27/05/2016
Field of study

Many decades ago, Carver Mead established the foundations of neuromorphic systems. Neuromorphic systems are analog circuits that emulate biology. These circuits utilize subthreshold dynamics of CMOS transistors to mimic the behavior of neurons. The objective is to not only simulate the human brain, but also to build useful applications using these bio-inspired circuits for ultra low power speech processing, image processing, and robotics. This can be achieved using reconfigurable hardware, like field programmable analog arrays (FPAAs), which enable configuring different applications on a cross platform system. As digital systems saturate in terms of power efficiency, this alternate approach has the potential to improve computational efficiency by approximately eight orders of magnitude. These systems, which include analog, digital, and neuromorphic elements combine to result in a very powerful reconfigurable processing machine.Ph.D

Scholarly Materials And Research @ Georgia Tech

TinyML: Tools, Applications, Challenges, and Future Research Directions

Author: Iyer Sridhar
Kallimani Rakhee
López Onel L. A.
Pai Krishna
Raghuwanshi Prasoon
Publication venue
Publication date: 23/03/2023
Field of study

In recent years, Artificial Intelligence (AI) and Machine learning (ML) have gained significant interest from both, industry and academia. Notably, conventional ML techniques require enormous amounts of power to meet the desired accuracy, which has limited their use mainly to high-capability devices such as network nodes. However, with many advancements in technologies such as the Internet of Things (IoT) and edge computing, it is desirable to incorporate ML techniques into resource-constrained embedded devices for distributed and ubiquitous intelligence. This has motivated the emergence of the TinyML paradigm which is an embedded ML technique that enables ML applications on multiple cheap, resource- and power-constrained devices. However, during this transition towards appropriate implementation of the TinyML technology, multiple challenges such as processing capacity optimization, improved reliability, and maintenance of learning models' accuracy require timely solutions. In this article, various avenues available for TinyML implementation are reviewed. Firstly, a background of TinyML is provided, followed by detailed discussions on various tools supporting TinyML. Then, state-of-art applications of TinyML using advanced technologies are detailed. Lastly, various research challenges and future directions are identified.Comment: 12 pags, 3 tables, 4 figure

arXiv.org e-Print Archive

Dataflow-Based Mapping of Computer Vision Algorithms onto FPGAs

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector