589 research outputs found
eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference
Convolutional neural networks (CNNs) have recently demonstrated superior
quality for computational imaging applications. Therefore, they have great
potential to revolutionize the image pipelines on cameras and displays.
However, it is difficult for conventional CNN accelerators to support
ultra-high-resolution videos at the edge due to their considerable DRAM
bandwidth and power consumption. Therefore, finding a further memory- and
computation-efficient microarchitecture is crucial to speed up this coming
revolution.
In this paper, we approach this goal by considering the inference flow,
network model, instruction set, and processor design jointly to optimize
hardware performance and image quality. We apply a block-based inference flow
which can eliminate all the DRAM bandwidth for feature maps and accordingly
propose a hardware-oriented network model, ERNet, to optimize image quality
based on hardware constraints. Then we devise a coarse-grained instruction set
architecture, FBISA, to support power-hungry convolution by massive
parallelism. Finally,we implement an embedded processor---eCNN---which
accommodates to ERNet and FBISA with a flexible processing architecture. Layout
results show that it can support high-quality ERNets for super-resolution and
denoising at up to 4K Ultra-HD 30 fps while using only DDR-400 and consuming
6.94W on average. By comparison, the state-of-the-art Diffy uses dual-channel
DDR3-2133 and consumes 54.3W to support lower-quality VDSR at Full HD 30 fps.
Lastly, we will also present application examples of high-performance style
transfer and object recognition to demonstrate the flexibility of eCNN.Comment: 14 pages; appearing in IEEE/ACM International Symposium on
Microarchitecture (MICRO), 201
BLADE: Filter Learning for General Purpose Computational Photography
The Rapid and Accurate Image Super Resolution (RAISR) method of Romano,
Isidoro, and Milanfar is a computationally efficient image upscaling method
using a trained set of filters. We describe a generalization of RAISR, which we
name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable
edge-adaptive filtering framework that is general, simple, computationally
efficient, and useful for a wide range of problems in computational
photography. We show applications to operations which may appear in a camera
pipeline including denoising, demosaicing, and stylization
Deep Video Precoding
Several groups worldwide are currently investigating how deep learning may advance the state-of-the-art in image and video coding. An open question is how to make deep neural networks work in conjunction with existing (and upcoming) video codecs, such as MPEG H.264/AVC, H.265/HEVC, VVC, Google VP9 and AOMedia AV1, AV2, as well as existing container and transport formats, without imposing any changes at the client side. Such compatibility is a crucial aspect when it comes to practical deployment, especially when considering the fact that the video content industry and hardware manufacturers are expected to remain committed to supporting these standards for the foreseeable future. We propose to use deep neural networks as precoders for current and future video codecs and adaptive video streaming systems. In our current design, the core precoding component comprises a cascaded structure of downscaling neural networks that operates during video encoding, prior to transmission. This is coupled with a precoding mode selection algorithm for each independently-decodable stream segment, which adjusts the downscaling factor according to scene characteristics, the utilized encoder, and the desired bitrate and encoding configuration. Our framework is compatible with all current and future codec and transport standards, as our deep precoding network structure is trained in conjunction with linear upscaling filters (e.g., the bilinear filter), which are supported by all web video players. Extensive evaluation on FHD (1080p) and UHD (2160p) content and with widely-used H.264/AVC, H.265/HEVC and VP9 encoders, as well as a preliminary evaluation with the current test model of VVC (v.6.2rc1), shows that coupling such standards with the proposed deep video precoding allows for 8% to 52% rate reduction under encoding configurations and bitrates suitable for video-on-demand adaptive streaming systems. The use of precoding can also lead to encoding complexity reduction, which is essential for cost-effective cloud deployment of complex encoders like H.265/HEVC, VP9 and VVC, especially when considering the prominence of high-resolution adaptive video streaming
Improved 3D MR Image Acquisition and Processing in Congenital Heart Disease
Congenital heart disease (CHD) is the most common type of birth defect, affecting about 1% of the population. MRI is an essential tool in the assessment of CHD, including diagnosis, intervention planning and follow-up. Three-dimensional MRI can provide particularly rich visualization and information. However, it is often complicated by long scan times, cardiorespiratory motion, injection of contrast agents, and complex and time-consuming postprocessing. This thesis comprises four pieces of work that attempt to respond to some of these challenges.
The first piece of work aims to enable fast acquisition of 3D time-resolved cardiac imaging during free breathing. Rapid imaging was achieved using an efficient spiral sequence and a sparse parallel imaging reconstruction. The feasibility of this approach was demonstrated on a population of 10 patients with CHD, and areas of improvement were identified.
The second piece of work is an integrated software tool designed to simplify and accelerate the development of machine learning (ML) applications in MRI research. It also exploits the strengths of recently developed ML libraries for efficient MR image reconstruction and processing.
The third piece of work aims to reduce contrast dose in contrast-enhanced MR angiography (MRA). This would reduce risks and costs associated with contrast agents. A deep learning-based contrast enhancement technique was developed and shown to improve image quality in real low-dose MRA in a population of 40 children and adults with CHD.
The fourth and final piece of work aims to simplify the creation of computational models for hemodynamic assessment of the great arteries. A deep learning technique for 3D segmentation of the aorta and the pulmonary arteries was developed and shown to enable accurate calculation of clinically relevant biomarkers in a population of 10 patients with CHD
- …