6,136 research outputs found
Performance Investigation of Digital Lowpass IIR Filter Based on Different Platforms
The work presented in this paper illuminates the design and simulation of a recursive or Infinite Impulse Response (IIR) filter. The proposed design algorithm employs the Genetic Algorithm to determine the filter coefficients to satisfy the required performance. The effectiveness of different platforms on filter design and performance has been studied in this paper. Three different platforms are considered to implement and verify the designed filter’s work through simulation. The first platform is the MATLAB/SIMULINK software package used to implement the Biquad form filter. This technique is the basis for the software implementation of the designed IIR filter. The HDL – Cosimulation technique is considered the second one; it inspired to take advantage of the existing tools in SIMULINK to convert the designed filter algorithm to the Very high-speed integrated circuit Hardware Description Language (VHDL) format. The System Generator is employed as the third technique, in which the designed filter is implemented as a hardware structure based on basic unit blocks provided by Xilinx System Generator. This technique facilitates the implementation of the designed filter in the FPGA target device. Simulation results show that the performance of the designed filter is remarkably reliable even with severe noise levels
Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
Recent technological advances have proliferated the available computing
power, memory, and speed of modern Central Processing Units (CPUs), Graphics
Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs).
Consequently, the performance and complexity of Artificial Neural Networks
(ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs)
currently offer state-of-the-art performance, they consume large amounts of
power. Training such networks on CPUs is inefficient, as data throughput and
parallel computation is limited. FPGAs are considered a suitable candidate for
performance critical, low power systems, e.g. the Internet of Things (IOT) edge
devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development
environment, networks described using the high-level OpenCL framework can be
accelerated on heterogeneous platforms. Moreover, the resource utilization and
power consumption of DNNs can be further enhanced by utilizing regularization
techniques that binarize network weights. In this paper, we introduce, to the
best of our knowledge, the first FPGA-accelerated stochastically binarized DNN
implementations, and compare them to implementations accelerated using both
GPUs and FPGAs. Our developed networks are trained and benchmarked using the
popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art
performance, while offering a >16-fold improvement in power consumption,
compared to conventional GPU-accelerated networks. Both our FPGA-accelerated
determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10
by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl
Text Coherence Analysis Based on Deep Neural Network
In this paper, we propose a novel deep coherence model (DCM) using a
convolutional neural network architecture to capture the text coherence. The
text coherence problem is investigated with a new perspective of learning
sentence distributional representation and text coherence modeling
simultaneously. In particular, the model captures the interactions between
sentences by computing the similarities of their distributional
representations. Further, it can be easily trained in an end-to-end fashion.
The proposed model is evaluated on a standard Sentence Ordering task. The
experimental results demonstrate its effectiveness and promise in coherence
assessment showing a significant improvement over the state-of-the-art by a
wide margin.Comment: 4 pages, 2 figures, CIKM 201
Neuro-memristive Circuits for Edge Computing: A review
The volume, veracity, variability, and velocity of data produced from the
ever-increasing network of sensors connected to Internet pose challenges for
power management, scalability, and sustainability of cloud computing
infrastructure. Increasing the data processing capability of edge computing
devices at lower power requirements can reduce several overheads for cloud
computing solutions. This paper provides the review of neuromorphic
CMOS-memristive architectures that can be integrated into edge computing
devices. We discuss why the neuromorphic architectures are useful for edge
devices and show the advantages, drawbacks and open problems in the field of
neuro-memristive circuits for edge computing
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
The hardware implementation of deep neural networks (DNNs) has recently
received tremendous attention: many applications in fact require high-speed
operations that suit a hardware implementation. However, numerous elements and
complex interconnections are usually required, leading to a large area
occupation and copious power consumption. Stochastic computing has shown
promising results for low-power area-efficient hardware implementations, even
though existing stochastic algorithms require long streams that cause long
latencies. In this paper, we propose an integer form of stochastic computation
and introduce some elementary circuits. We then propose an efficient
implementation of a DNN based on integral stochastic computing. The proposed
architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62%
average reductions in area and latency compared to the best reported
architecture in literature. We also synthesize the circuits in a 65 nm CMOS
technology and we show that the proposed integral stochastic architecture
results in up to 21% reduction in energy consumption compared to the binary
radix implementation at the same misclassification rate. Due to fault-tolerant
nature of stochastic architectures, we also consider a quasi-synchronous
implementation which yields 33% reduction in energy consumption w.r.t. the
binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure
Deep Bilateral Learning for Real-Time Image Enhancement
Performance is a critical challenge in mobile image processing. Given a
reference imaging pipeline, or even human-adjusted pairs of images, we seek to
reproduce the enhancements and enable real-time evaluation. For this, we
introduce a new neural network architecture inspired by bilateral grid
processing and local affine color transforms. Using pairs of input/output
images, we train a convolutional neural network to predict the coefficients of
a locally-affine model in bilateral space. Our architecture learns to make
local, global, and content-dependent decisions to approximate the desired image
transformation. At runtime, the neural network consumes a low-resolution
version of the input image, produces a set of affine transformations in
bilateral space, upsamples those transformations in an edge-preserving fashion
using a new slicing node, and then applies those upsampled transformations to
the full-resolution image. Our algorithm processes high-resolution images on a
smartphone in milliseconds, provides a real-time viewfinder at 1080p
resolution, and matches the quality of state-of-the-art approximation
techniques on a large class of image operators. Unlike previous work, our model
is trained off-line from data and therefore does not require access to the
original operator at runtime. This allows our model to learn complex,
scene-dependent transformations for which no reference implementation is
available, such as the photographic edits of a human retoucher.Comment: 12 pages, 14 figures, Siggraph 201
- …