80,665 research outputs found
Structure fusion based on graph convolutional networks for semi-supervised classification
Suffering from the multi-view data diversity and complexity for
semi-supervised classification, most of existing graph convolutional networks
focus on the networks architecture construction or the salient graph structure
preservation, and ignore the the complete graph structure for semi-supervised
classification contribution. To mine the more complete distribution structure
from multi-view data with the consideration of the specificity and the
commonality, we propose structure fusion based on graph convolutional networks
(SF-GCN) for improving the performance of semi-supervised classification.
SF-GCN can not only retain the special characteristic of each view data by
spectral embedding, but also capture the common style of multi-view data by
distance metric between multi-graph structures. Suppose the linear relationship
between multi-graph structures, we can construct the optimization function of
structure fusion model by balancing the specificity loss and the commonality
loss. By solving this function, we can simultaneously obtain the fusion
spectral embedding from the multi-view data and the fusion structure as
adjacent matrix to input graph convolutional networks for semi-supervised
classification. Experiments demonstrate that the performance of SF-GCN
outperforms that of the state of the arts on three challenging datasets, which
are Cora,Citeseer and Pubmed in citation networks
Automatic Brain Tumor Segmentation using Cascaded Anisotropic Convolutional Neural Networks
A cascade of fully convolutional neural networks is proposed to segment
multi-modal Magnetic Resonance (MR) images with brain tumor into background and
three hierarchical regions: whole tumor, tumor core and enhancing tumor core.
The cascade is designed to decompose the multi-class segmentation problem into
a sequence of three binary segmentation problems according to the subregion
hierarchy. The whole tumor is segmented in the first step and the bounding box
of the result is used for the tumor core segmentation in the second step. The
enhancing tumor core is then segmented based on the bounding box of the tumor
core segmentation result. Our networks consist of multiple layers of
anisotropic and dilated convolution filters, and they are combined with
multi-view fusion to reduce false positives. Residual connections and
multi-scale predictions are employed in these networks to boost the
segmentation performance. Experiments with BraTS 2017 validation set show that
the proposed method achieved average Dice scores of 0.7859, 0.9050, 0.8378 for
enhancing tumor core, whole tumor and tumor core, respectively. The
corresponding values for BraTS 2017 testing set were 0.7831, 0.8739, and
0.7748, respectively.Comment: 12 pages, 5 figures. MICCAI Brats Challenge 201
Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images
Recovering the 3D representation of an object from single-view or multi-view
RGB images by deep neural networks has attracted increasing attention in the
past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural
networks (RNNs) to fuse multiple feature maps extracted from input images
sequentially. However, when given the same set of input images with different
orders, RNN-based approaches are unable to produce consistent reconstruction
results. Moreover, due to long-term memory loss, RNNs cannot fully exploit
input images to refine reconstruction results. To solve these problems, we
propose a novel framework for single-view and multi-view 3D reconstruction,
named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse
3D volume from each input image. Then, a context-aware fusion module is
introduced to adaptively select high-quality reconstructions for each part
(e.g., table legs) from different coarse 3D volumes to obtain a fused 3D
volume. Finally, a refiner further refines the fused 3D volume to generate the
final output. Experimental results on the ShapeNet and Pix3D benchmarks
indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large
margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in
terms of backward inference time. The experiments on ShapeNet unseen 3D
categories have shown the superior generalization abilities of our method.Comment: ICCV 201
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Existing deep convolutional neural networks (CNNs) require a fixed-size
(e.g., 224x224) input image. This requirement is "artificial" and may reduce
the recognition accuracy for the images or sub-images of an arbitrary
size/scale. In this work, we equip the networks with another pooling strategy,
"spatial pyramid pooling", to eliminate the above requirement. The new network
structure, called SPP-net, can generate a fixed-length representation
regardless of image size/scale. Pyramid pooling is also robust to object
deformations. With these advantages, SPP-net should in general improve all
CNN-based image classification methods. On the ImageNet 2012 dataset, we
demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures
despite their different designs. On the Pascal VOC 2007 and Caltech101
datasets, SPP-net achieves state-of-the-art classification results using a
single full-image representation and no fine-tuning.
The power of SPP-net is also significant in object detection. Using SPP-net,
we compute the feature maps from the entire image only once, and then pool
features in arbitrary regions (sub-images) to generate fixed-length
representations for training the detectors. This method avoids repeatedly
computing the convolutional features. In processing test images, our method is
24-102x faster than the R-CNN method, while achieving better or comparable
accuracy on Pascal VOC 2007.
In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our
methods rank #2 in object detection and #3 in image classification among all 38
teams. This manuscript also introduces the improvement made for this
competition.Comment: This manuscript is the accepted version for IEEE Transactions on
Pattern Analysis and Machine Intelligence (TPAMI) 2015. See Changelo
Multi-GCN: Graph Convolutional Networks for Multi-View Networks, with Applications to Global Poverty
With the rapid expansion of mobile phone networks in developing countries,
large-scale graph machine learning has gained sudden relevance in the study of
global poverty. Recent applications range from humanitarian response and
poverty estimation to urban planning and epidemic containment. Yet the vast
majority of computational tools and algorithms used in these applications do
not account for the multi-view nature of social networks: people are related in
myriad ways, but most graph learning models treat relations as binary. In this
paper, we develop a graph-based convolutional network for learning on
multi-view networks. We show that this method outperforms state-of-the-art
semi-supervised learning algorithms on three different prediction tasks using
mobile phone datasets from three different developing countries. We also show
that, while designed specifically for use in poverty research, the algorithm
also outperforms existing benchmarks on a broader set of learning tasks on
multi-view networks, including node labelling in citation networks
End-to-End Multi-View Networks for Text Classification
We propose a multi-view network for text classification. Our method
automatically creates various views of its input text, each taking the form of
soft attention weights that distribute the classifier's focus among a set of
base features. For a bag-of-words representation, each view focuses on a
different subset of the text's words. Aggregating many such views results in a
more discriminative and robust representation. Through a novel architecture
that both stacks and concatenates views, we produce a network that emphasizes
both depth and width, allowing training to converge quickly. Using our
multi-view architecture, we establish new state-of-the-art accuracies on two
benchmark tasks.Comment: 6 page
- …