Search CORE

5,368 research outputs found

Two-level pipelined systolic array graphics engine

Author: El hadidy F.
El Hadidy F. Moelaert
Herrmann O.E.
Jayasinghe J.A.K.S.
Jayasinghe J.A.K.S.
Karagiannis Georgios
Smit Jaap
Publication venue: IEEE Press
Publication date: 01/01/1990
Field of study

The authors report a VLSI design of an advanced systolic array graphics (SAG) engine built from pipelined functional units which can generate realistic images interactively for high-resolution displays. They introduce a structured frame store system as an environment for the advanced SAG engine and present the principles and architecture of the advanced SAG engine. They introduce pipelined functional units into this SAG engine to meet the performance requirements. This is done by a formal approach where the original systolic array is represented at bit level by a finite, vertex-weighted, edge-weighted, directed graph. Two architectures built from pipelined functional units are described. A prototype containing nine processing elements was fabricated in a 1.6-¿m CMOS technolog

University of Twente Research Information

Recommended from our members

Monitoring of the central blood pressure waveform via a conformal ultrasonic device.

Author: Bhaskaran Shubha
Chen Yimu
Chen Zeyu
Gong Hua
Gu Yue
Guo Yuxuan
Hu Hongjie
Huang Brady
Huang Zhenlong
Lei Yusheng
Li Xiaoshi
Li Yang
Lin Muyang
Makihata Mitsutoshi
Pisano Albert P
Wang Chonghe
Wang Chunfeng
Xu Sheng
Yin Zhenan
Zhang Liangfang
Zhang Lin
Zhang Tianjiao
Zhang Zhuorui
Zhou Qifa
Publication venue: eScholarship, University of California
Publication date: 01/09/2018
Field of study

Continuous monitoring of the central-blood-pressure waveform from deeply embedded vessels, such as the carotid artery and jugular vein, has clinical value for the prediction of all-cause cardiovascular mortality. However, existing non-invasive approaches, including photoplethysmography and tonometry, only enable access to the superficial peripheral vasculature. Although current ultrasonic technologies allow non-invasive deep-tissue observation, unstable coupling with the tissue surface resulting from the bulkiness and rigidity of conventional ultrasound probes introduces usability constraints. Here, we describe the design and operation of an ultrasonic device that is conformal to the skin and capable of capturing blood-pressure waveforms at deeply embedded arterial and venous sites. The wearable device is ultrathin (240 μm) and stretchable (with strains up to 60%), and enables the non-invasive, continuous and accurate monitoring of cardiovascular events from multiple body locations, which should facilitate its use in a variety of clinical environments

eScholarship - University of California

A VLSI pipeline design of a fast prime factor DFT on a finite field

Author: Hsu I. S.
Reed I. S.
Shao H. M.
Shyu H. C.
Truong T. K.
Publication venue
Publication date
Field of study

A conventional prime factor discrete Fourier transform (DFT) algorithm is used to realize a discrete Fourier-like transform on the finite field, GF(q sub n). A pipeline structure is used to implement this prime factor DFT over GF(q sub n). This algorithm is developed to compute cyclic convolutions of complex numbers and to decode Reed-Solomon codes. Such a pipeline fast prime factor DFT algorithm over GF(q sub n) is regular, simple, expandable, and naturally suitable for VLSI implementation. An example illustrating the pipeline aspect of a 30-point transform over GF(q sub n) is presented

NASA Technical Reports Server

Temporal unpredictability detection of real-time video sequence

Author: Liu Yang
Liu Yang
Publication venue
Publication date: 01/01/2008
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

An efficient hardware architecture for a neural network activation function generator

Author: B. Widrow
D. Zhang
D.J. Myers
G. Cauwenberghs
H. Amin
H. Faiedh
I. Koren
J. Holt
J. Pineiro
K. Basterretxea
M. Tommiska
S. Draghici
S. Haykins
S. Vassiliadis
U. Ruckert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

This paper proposes an efficient hardware architecture for a function generator suitable for an artificial neural network (ANN). A spline-based approximation function is designed that provides a good trade-off between accuracy and silicon area, whilst also being inherently scalable and adaptable for numerous activation functions. This has been achieved by using a minimax polynomial and through optimal placement of the approximating polynomials based on the results of a genetic algorithm. The approximation error of the proposed method compares favourably to all related research in this field. Efficient hardware multiplication circuitry is used in the implementation, which reduces the area overhead and increases the throughput

Crossref

Irish Universities

DCU Online Research Access Service

Polypill for prevention of cardiovascular disease in an Urban Iranian population with special focus on nonalcoholic steatohepatitis: A pragmatic randomized controlled trial within a cohort (PolyIran - Liver) – Study protocol

Author: Hemming K.
Jafari E.
Khoshnia M.
Khuzani A.S.
Malekzadeh R.
Marshall T.
Merat S.
Nateghi A.
Poustchi H.
Radmard A.-R.
Publication venue: Academy of Medical Sciences of I.R. Iran
Publication date: 01/01/2015
Field of study

Background: Cardiovascular disease (CVD) is among the most common causes of mortality in all populations. Nonalcoholic steatohepatitis is a common finding in patients with CVD. Prevention of CVD in individual patients typically requires periodic clinical evaluation, as well as diagnosis and management of risk factors such as hypertension and hyperlipidemia. However, this is resource consuming and hard to implement, especially in developing countries. We designed a study to investigate the effects of a simpler strategy: a fixed-dose combination pill consisting of aspirin, valsartan, atorvastatin and hydrochlorthiazide (PolyPill) in an unselected group of persons aged over 50 years. Design: The PolyIran-Liver study was performed in Gonbad city as an open label pragmatic randomized controlled trial nested within the Golestan Cohort Study. We randomly selected 2,400 cohort study participants aged above 50 years, randomly assigned them to intervention or usual care and invited them to participate in an additional measurement study (if they met the eligibility criteria) to measure liver related outcomes. Those agreeing and randomized to the intervention arm were offered a daily single dose of PolyPill. We will follow participants for 5 years. The primary outcome is major cardiovascular events, secondary outcomes include all-cause mortality and liver related outcomes: liver stiffness and liver enzyme levels. Cardiovascular outcomes and mortality will be determined from the cohort study and liver-related outcomes in those consenting to follow up. Analysis will be by allocated group. Trial Status: Between October and December 2011, 1,320 intervention and 1,080 control participants were invited to participate in the additional measurement study. For all these participants, the major cardiovascular events will be determined using blind assessment of outcomes through the cohort study. In the intervention and control arms, 875 (66%) and 721 (67%) respectively, met the eligibility criteria and agreed to participate in the additional measurement study. Liver related outcomes will be measured in these participants. Of the 1,320 participants randomized to the intervention, 787 (60%) accepted the PolyPill. Conclusion: The PolyIran-liver urban study will provide us with important information on the effectiveness of PolyPill on major cardiovascular events, all-cause mortality and liver related outcomes. (ClinicalTrials.gov ID: NCT01245608). © 2015, Academy of Medical Sciences of I.R. Iran. All rights reserved

University of Birmingham Research Portal

Golestan University of Medical Sciences Repository

An Efficient Cell List Implementation for Monte Carlo Simulation on GPUs

Author: Hailat Eyad
Mick Jason
Potoff Jeffrey
Rushaidat Kamel
Schwiebert Loren
Publication venue
Publication date: 16/08/2014
Field of study

Maximizing the performance potential of the modern day GPU architecture requires judicious utilization of available parallel resources. Although dramatic reductions can often be obtained through straightforward mappings, further performance improvements often require algorithmic redesigns to more closely exploit the target architecture. In this paper, we focus on efficient molecular simulations for the GPU and propose a novel cell list algorithm that better utilizes its parallel resources. Our goal is an efficient GPU implementation of large-scale Monte Carlo simulations for the grand canonical ensemble. This is a particularly challenging application because there is inherently less computation and parallelism than in similar applications with molecular dynamics. Consistent with the results of prior researchers, our simulation results show traditional cell list implementations for Monte Carlo simulations of molecular systems offer effectively no performance improvement for small systems [5, 14], even when porting to the GPU. However for larger systems, the cell list implementation offers significant gains in performance. Furthermore, our novel cell list approach results in better performance for all problem sizes when compared with other GPU implementations with or without cell lists.Comment: 30 page

arXiv.org e-Print Archive

CiteSeerX

Fast and Scalable Architectures and Algorithms for the Computation of the Forward and Inverse Discrete Periodic Radon Transform with Applications to 2D Convolutions and Cross-Correlations

Author: Carranza Cesar
Publication venue: UNM Digital Repository
Publication date: 09/06/2016
Field of study

The Discrete Radon Transform (DRT) is an essential component of a wide range of applications in image processing, e.g. image denoising, image restoration, texture analysis, line detection, encryption, compressive sensing and reconstructing objects from projections in computed tomography and magnetic resonance imaging. A popular method to obtain the DRT, or its inverse, involves the use of the Fast Fourier Transform, with the inherent approximation/rounding errors and increased hardware complexity due the need for floating point arithmetic implementations. An alternative implementation of the DRT is through the use of the Discrete Periodic Radon Transform (DPRT). The DPRT also exhibits discrete properties of the continuous-space Radon Transform, including the Fourier Slice Theorem and the convolution property. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions O(N^3) and the need for a large number of memory accesses. This PhD dissertation introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: (i) a parallel array of fixed-point adder trees, (ii) circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees, and (iii) an image block-based approach to DPRT computation that can fit the proposed architecture to available resources, and as a result, for an NxN image (N prime), the proposed approach can compute up to N^2 additions per clock cycle. Compared to previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For the fastest case, I introduce optimized architectures that can compute the DPRT and its inverse in just 2N +ceil(log2 N)+1 and 2N +3(log2 N)+B+2 clock cycles respectively, where B is the number of bits used to represent each input pixel. In comparison, the prior state of the art method required N^2 +N +1 clock cycles for computing the forward DPRT. For systems with limited resources, the resource usage can be reduced to O(N) with a running time of ceil(N/2)(N + 9) + N + 2 for the forward DPRT and ceil(N/2)(N + 2) + 3ceil(log2 N) + B + 4 for the inverse. The results also have important applications in the computation of fast convolutions and cross-correlations for large and non-separable kernels. For this purpose, I introduce fast algorithms and scalable architectures to compute 2-D Linear convolutions/cross-correlations using the convolution property of the DPRT and fixed point arithmetic to simplify the 2-D problem into a 1-D problem. Also an alternative system is proposed for non-separable kernels with low rank using the LU decomposition. As a result, for implementations with enough resources, for a an image and convolution kernel of size PxP, linear convolutions/cross correlations can be computed in just 6N + 4 log2 N + 17 clock cycles for N = 2P-1. Finally, I also propose parallel algorithms to compute the forward and inverse DPRT using Graphic Processing Units (GPUs) and CPUs with multiple cores. The proposed algorithms are implemented in a GPU Nvidia Maxwell GM204 with 2048 cores@1367MHz, 348KB L1 cache (24KB per multiprocessor), 2048KB L2 cache (512KB per memory controller), 4GB device memory, and compared against a serial implementation on a CPU Intel Xeon E5-2630 with 8 physical cores (16 logical processors via hyper-threading)@3.2GHz, L1 cache 512K (32KB Instruction cache, 32KB data cache, per core), L2 cache 2MB (256KB per core), L3 cache 20MB (Shared among all cores), 32GB of system memory. For the CPU, there is a tenfold speedup using 16 logical cores versus a single-core serial implementation. For the GPU, there is a 715-fold speedup compared to the serial implementation. For real-time applications, for an 1021x1021 image, the forward DPRT takes 11.5ms and 11.4ms for the inverse

Periodic Application of Concurrent Error Detection in Processor Array Architectures

Author: Chen Paul Peichuan
Publication venue
Publication date: 01/04/1993
Field of study

Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance

Illinois Digital Environment for Access to Learning and Scholarship Repository

NASA Technical Reports Server