Search CORE

1,518 research outputs found

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

Dependable reconfigurable multi-sensor poles for security

Author: Kerkhoff Hans G.
Publication venue: IEEE Computer Society
Publication date: 01/01/2009
Field of study

Wireless sensor network poles for security monitoring under harsh environments require a very high dependability as they are safety-critical [1]. An example of a multi-sensor pole is shown. Crucial attribute in these systems for security, especially in harsh environment, is a high robustness and guaranteed availability during lifetime. This environment could include molest. In this paper, two approaches are used which are applied simultaneously but are developed in different projects. \u

University of Twente Research Information

Digital implementation of the cellular sensor-computers

Author: Benthien
Bolotski
Catrysse
Diamantaras
Domínguez-Castro
Dudek
Dudek
Dudek
El Gamal
El Gamal
Espejo
Foldesy
Gielen
Grossberg
Hamamoto
Herbordt
Johansson
Kleinfelder
Linan
Liu
Liñán
Liñán
Morris
Murray
Nayar
Ohta
Roska
Roska
Roska
Rudack
Schneider
Serra
Sinno
Tian
Wagner
Wanhammar
Zarándy
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

Two different kinds of cellular sensor-processor architectures are used nowadays in various applications. The first is the traditional sensor-processor architecture, where the sensor and the processor arrays are mapped into each other. The second is the foveal architecture, in which a small active fovea is navigating in a large sensor array. This second architecture is introduced and compared here. Both of these architectures can be implemented with analog and digital processor arrays. The efficiency of the different implementation types, depending on the used CMOS technology, is analyzed. It turned out, that the finer the technology is, the better to use digital implementation rather than analog

Crossref

SZTAKI Publication Repository

Repository of the Academy's Library

AI/ML Algorithms and Applications in VLSI Design and Technology

Author: Abbas Zia
Ahmad Amir
Amuru Deepthi
Cherupally Pavan K.
Gurram Sushanth R.
Vudumula Harsha V.
Zahra Andleeb
Publication venue
Publication date: 15/02/2023
Field of study

An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

arXiv.org e-Print Archive

Hierarchical Composition of Memristive Networks for Real-Time Computing

Author: Bürger Jens
Goudarzi Alireza
Stefanovic Darko
Teuscher Christof
Publication venue
Publication date: 25/04/2015
Field of study

Advances in materials science have led to physical instantiations of self-assembled networks of memristive devices and demonstrations of their computational capability through reservoir computing. Reservoir computing is an approach that takes advantage of collective system dynamics for real-time computing. A dynamical system, called a reservoir, is excited with a time-varying signal and observations of its states are used to reconstruct a desired output signal. However, such a monolithic assembly limits the computational power due to signal interdependency and the resulting correlated readouts. Here, we introduce an approach that hierarchically composes a set of interconnected memristive networks into a larger reservoir. We use signal amplification and restoration to reduce reservoir state correlation, which improves the feature extraction from the input signals. Using the same number of output signals, such a hierarchical composition of heterogeneous small networks outperforms monolithic memristive networks by at least 20% on waveform generation tasks. On the NARMA-10 task, we reduce the error by up to a factor of 2 compared to homogeneous reservoirs with sigmoidal neurons, whereas single memristive networks are unable to produce the correct result. Hierarchical composition is key for solving more complex tasks with such novel nano-scale hardware

arXiv.org e-Print Archive

Crossref

PDXScholar (Portland State University)

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

Author: Benini Luca
Cavigelli Lukas
Rutishauser Georg
Scherer Moritz
Publication venue
Publication date: 04/02/2021
Field of study

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing non-computational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

On recursive least-squares filtering algorithms and implementations

Author: Hsieh Shih-Fu
Publication venue
Publication date
Field of study

In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends crucially on specific application

NASA Technical Reports Server

A direct-sequence spread-spectrum communication system for integrated sensor microsystems

Author: Arslan T.
Aydin N.
Cumming D.R.S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2005
Field of study

Some of the most important challenges in health-care technologies have been identified to be development of noninvasive systems and miniaturization. In developing the core technologies, progress is required in pushing the limits of miniaturization, minimizing the costs and power consumption of microsystems components, developing mobile/wireless communication infrastructures and computing technologies that are reliable. The implementation of such miniaturized systems has become feasible by the advent of system-on-chip technology, which enables us to integrate most of the components of a system on to a single chip. One of the most important tasks in such a system is to convey information reliably on a multiple-access-based environment. When considering the design of telecommunication system for such a network, the receiver is the key performance critical block. The paper describes the application environment, the choice of the communication protocol, the implementation of the transmitter and receiver circuitry, and research work carried out on studying the impact of input data characteristics and internal data path complexity on area and power performance of the receiver. We provide results using a test data recorded from a pH sensor. The results demonstrate satisfying functionality, area, and power constraints even when a degree of programmability is incorporated in the system

Enlighten