1,707 research outputs found
Delta Networks for Optimized Recurrent Network Computation
Many neural networks exhibit stability in their activation patterns over time
in response to inputs from sensors operating under real-world conditions. By
capitalizing on this property of natural signals, we propose a Recurrent Neural
Network (RNN) architecture called a delta network in which each neuron
transmits its value only when the change in its activation exceeds a threshold.
The execution of RNNs as delta networks is attractive because their states must
be stored and fetched at every timestep, unlike in convolutional neural
networks (CNNs). We show that a naive run-time delta network implementation
offers modest improvements on the number of memory accesses and computes, but
optimized training techniques confer higher accuracy at higher speedup. With
these optimizations, we demonstrate a 9X reduction in cost with negligible loss
of accuracy for the TIDIGITS audio digit recognition benchmark. Similarly, on
the large Wall Street Journal speech recognition benchmark even existing
networks can be greatly accelerated as delta networks, and a 5.7x improvement
with negligible loss of accuracy can be obtained through training. Finally, on
an end-to-end CNN trained for steering angle prediction in a driving dataset,
the RNN cost can be reduced by a substantial 100X
Parallel Computation of Nonrigid Image Registration
Automatic intensity-based nonrigid image registration brings significant impact in medical applications such as multimodality fusion of images, serial comparison for monitoring disease progression or regression, and minimally invasive image-guided interventions. However, due to memory and compute intensive nature of the operations, intensity-based image registration has remained too slow to be practical for clinical adoption, with its use limited primarily to as a pre-operative too. Efficient registration methods can lead to new possibilities for development of improved and interactive intraoperative tools and capabilities.
In this thesis, we propose an efficient parallel implementation for intensity-based three-dimensional nonrigid image registration on a commodity graphics processing unit. Optimization techniques are developed to accelerate the compute-intensive mutual information computation. The study is performed on the hierarchical volume subdivision-based algorithm, which is inherently faster than other nonrigid registration algorithms and structurally well-suited for data-parallel computation platforms. The proposed implementation achieves more than 50-fold runtime improvement over a standard implementation on a CPU. The execution time of nonrigid image registration is reduced from hours to minutes while retaining the same level of registration accuracy
Advanced Applications of Rapid Prototyping Technology in Modern Engineering
Rapid prototyping (RP) technology has been widely known and appreciated due to its flexible and customized manufacturing capabilities. The widely studied RP techniques include stereolithography apparatus (SLA), selective laser sintering (SLS), three-dimensional printing (3DP), fused deposition modeling (FDM), 3D plotting, solid ground curing (SGC), multiphase jet solidification (MJS), laminated object manufacturing (LOM). Different techniques are associated with different materials and/or processing principles and thus are devoted to specific applications. RP technology has no longer been only for prototype building rather has been extended for real industrial manufacturing solutions. Today, the RP technology has contributed to almost all engineering areas that include mechanical, materials, industrial, aerospace, electrical and most recently biomedical engineering. This book aims to present the advanced development of RP technologies in various engineering areas as the solutions to the real world engineering problems
ReS2tAC -- UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
With the emergence of low-cost robotic systems, such as unmanned aerial
vehicle, the importance of embedded high-performance image processing has
increased. For a long time, FPGAs were the only processing hardware that were
capable of high-performance computing, while at the same time preserving a low
power consumption, essential for embedded systems. However, the recently
increasing availability of embedded GPU-based systems, such as the NVIDIA
Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for
massively parallel embedded computing on graphics hardware. With this in mind,
we propose an approach for real-time embedded stereo processing on ARM and
CUDA-enabled devices, which is based on the popular and widely used Semi-Global
Matching algorithm. In this, we propose an optimization of the algorithm for
embedded CUDA GPUs, by using massively parallel computing, as well as using the
NEON intrinsics to optimize the algorithm for vectorized SIMD processing on
embedded ARM CPUs. We have evaluated our approach with different configurations
on two public stereo benchmark datasets to demonstrate that they can reach an
error rate as low as 3.3%. Furthermore, our experiments show that the fastest
configuration of our approach reaches up to 46 FPS on VGA image resolution.
Finally, in a use-case specific qualitative evaluation, we have evaluated the
power consumption of our approach and deployed it on the DJI Manifold 2-G
attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating
its suitability for real-time stereo processing onboard a UAV
Image Processing and Analysis for Preclinical and Clinical Applications
Radiomics is one of the most successful branches of research in the field of image processing and analysis, as it provides valuable quantitative information for the personalized medicine. It has the potential to discover features of the disease that cannot be appreciated with the naked eye in both preclinical and clinical studies. In general, all quantitative approaches based on biomedical images, such as positron emission tomography (PET), computed tomography (CT) and magnetic resonance imaging (MRI), have a positive clinical impact in the detection of biological processes and diseases as well as in predicting response to treatment. This Special Issue, “Image Processing and Analysis for Preclinical and Clinical Applications”, addresses some gaps in this field to improve the quality of research in the clinical and preclinical environment. It consists of fourteen peer-reviewed papers covering a range of topics and applications related to biomedical image processing and analysis
Locust-inspired vision system on chip architecture for collision detection in automotive applications
This paper describes a programmable digital computing architecture dedicated to process information in accordance to the organization and operating principles of the four-layer neuron structure encountered at the visual system of Locusts. This architecture takes advantage of the natural collision detection skills of locusts and is capable of processing images and ascertaining collision threats in real-time automotive scenarios. In addition to the Locust features, the architecture embeds a Topological Feature Estimator module to identify and classify objects in collision course.European Commission IST2001 - 38097Ministerio de Ciencia y Tecnología TIC2003 - 09817- C02 - 0
- …