Search CORE

28,918 research outputs found

Sparsity Invariant CNNs

Author: Brox Thomas
Franke Uwe
Geiger Andreas
Schneider Lukas
Schneider Nick
Uhrig Jonas
Publication venue
Publication date: 30/08/2017
Field of study

In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication

arXiv.org e-Print Archive

Crossref

GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization

Author: Bin Dong
Cai J
Cho S
Dong B
Gu X
Gu X
Han G Liang Z You J
Hestenes M R
Jacobs F
Jia X
Li M
Men C
Men C H
Meyer Y
NVIDIA
Sharp G C
Shen Z W
Shen Z W Toh K C Yun S
Sidky E Y
Sidky E Y
Steve B Jiang
Tang J
Xu F
Xun Jia
Yan G R
Yifei Lou
Publication venue: 'IOP Publishing'
Publication date: 05/05/2011
Field of study

X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical concern in most image guided radiation therapy procedures. It is the goal of this paper to develop a fast GPU-based algorithm to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. For this purpose, we have developed an iterative tight frame (TF) based CBCT reconstruction algorithm. A condition that a real CBCT image has a sparse representation under a TF basis is imposed in the iteration process as regularization to the solution. To speed up the computation, a multi-grid method is employed. Our GPU implementation has achieved high computational efficiency and a CBCT image of resolution 512\times512\times70 can be reconstructed in ~5 min. We have tested our algorithm on a digital NCAT phantom and a physical Catphan phantom. It is found that our TF-based algorithm is able to reconstrct CBCT in the context of undersampling and low mAs levels. We have also quantitatively analyzed the reconstructed CBCT image quality in terms of modulation-transfer-function and contrast-to-noise ratio under various scanning conditions. The results confirm the high CBCT image quality obtained from our TF algorithm. Moreover, our algorithm has also been validated in a real clinical context using a head-and-neck patient case. Comparisons of the developed TF algorithm and the current state-of-the-art TV algorithm have also been made in various cases studied in terms of reconstructed image quality and computation efficiency.Comment: 24 pages, 8 figures, accepted by Phys. Med. Bio

arXiv.org e-Print Archive

Crossref

AMC: Attention guided Multi-modal Correlation Learning for Image Search

Author: Bui Trung
Chen Fang
Chen Kan
Nevatia Ram
Wang Zhaowen
Publication venue
Publication date: 03/04/2017
Field of study

Given a user's query, traditional image search systems rank images according to its relevance to a single modality (e.g., image content or surrounding text). Nowadays, an increasing number of images on the Internet are available with associated meta data in rich modalities (e.g., titles, keywords, tags, etc.), which can be exploited for better similarity measure with queries. In this paper, we leverage visual and textual modalities for image search by learning their correlation with input query. According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities. We propose a novel Attention guided Multi-modal Correlation (AMC) learning method which consists of a jointly learned hierarchy of intra and inter-attention networks. Conditioned on query's intent, intra-attention networks (i.e., visual intra-attention network and language intra-attention network) attend on informative parts within each modality; a multi-modal inter-attention network promotes the importance of the most query-relevant modalities. In experiments, we evaluate AMC models on the search logs from two real world image search engines and show a significant boost on the ranking of user-clicked images in search results. Additionally, we extend AMC models to caption ranking task on COCO dataset and achieve competitive results compared with recent state-of-the-arts.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Author: Aditya Somak
Baral Chitta
Yang Yezhou
Publication venue
Publication date: 23/03/2018
Field of study

Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image. Current state-of-the-art systems attempted to solve the task using deep neural architectures and achieved promising performance. However, the resulting systems are generally opaque and they struggle in understanding questions for which extra knowledge is required. In this paper, we present an explicit reasoning layer on top of a set of penultimate neural network based systems. The reasoning layer enables reasoning and answering questions where additional knowledge is required, and at the same time provides an interpretable interface to the end users. Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based engine to reason over a basket of inputs: visual relations, the semantic parse of the question, and background ontological knowledge from word2vec and ConceptNet. Experimental analysis of the answers and the key evidential predicates generated on the VQA dataset validate our approach.Comment: 9 pages, 3 figures, AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Optimally Stabilized PET Image Denoising Using Trilateral Filtering

Author: A. Chatziioannou
F. Hofheinz
F.E. Turkheimer
F.J. Anscombe
U. Bagci
Publication venue
Publication date: 01/01/2014
Field of study

Low-resolution and signal-dependent noise distribution in positron emission tomography (PET) images makes denoising process an inevitable step prior to qualitative and quantitative image analysis tasks. Conventional PET denoising methods either over-smooth small-sized structures due to resolution limitation or make incorrect assumptions about the noise characteristics. Therefore, clinically important quantitative information may be corrupted. To address these challenges, we introduced a novel approach to remove signal-dependent noise in the PET images where the noise distribution was considered as Poisson-Gaussian mixed. Meanwhile, the generalized Anscombe's transformation (GAT) was used to stabilize varying nature of the PET noise. Other than noise stabilization, it is also desirable for the noise removal filter to preserve the boundaries of the structures while smoothing the noisy regions. Indeed, it is important to avoid significant loss of quantitative information such as standard uptake value (SUV)-based metrics as well as metabolic lesion volume. To satisfy all these properties, we extended bilateral filtering method into trilateral filtering through multiscaling and optimal Gaussianization process. The proposed method was tested on more than 50 PET-CT images from various patients having different cancers and achieved the superior performance compared to the widely used denoising techniques in the literature.Comment: 8 pages, 3 figures; to appear in the Lecture Notes in Computer Science (MICCAI 2014

arXiv.org e-Print Archive

Crossref