57,605 research outputs found
Convolutional Neural Fabrics
Despite the success of CNNs, selecting the optimal architecture for a given
task remains an open problem. Instead of aiming to select a single optimal
architecture, we propose a "fabric" that embeds an exponentially large number
of architectures. The fabric consists of a 3D trellis that connects response
maps at different layers, scales, and channels with a sparse homogeneous local
connectivity pattern. The only hyper-parameters of a fabric are the number of
channels and layers. While individual architectures can be recovered as paths,
the fabric can in addition ensemble all embedded architectures together,
sharing their weights where their paths overlap. Parameters can be learned
using standard methods based on back-propagation, at a cost that scales
linearly in the fabric size. We present benchmark results competitive with the
state of the art for image classification on MNIST and CIFAR10, and for
semantic segmentation on the Part Labels dataset.Comment: Corrected typos (In proceedings of NIPS16
Joint segmentation and classification of retinal arteries/veins from fundus images
Objective Automatic artery/vein (A/V) segmentation from fundus images is
required to track blood vessel changes occurring with many pathologies
including retinopathy and cardiovascular pathologies. One of the clinical
measures that quantifies vessel changes is the arterio-venous ratio (AVR) which
represents the ratio between artery and vein diameters. This measure
significantly depends on the accuracy of vessel segmentation and classification
into arteries and veins. This paper proposes a fast, novel method for semantic
A/V segmentation combining deep learning and graph propagation.
Methods A convolutional neural network (CNN) is proposed to jointly segment
and classify vessels into arteries and veins. The initial CNN labeling is
propagated through a graph representation of the retinal vasculature, whose
nodes are defined as the vessel branches and edges are weighted by the cost of
linking pairs of branches. To efficiently propagate the labels, the graph is
simplified into its minimum spanning tree.
Results The method achieves an accuracy of 94.8% for vessels segmentation.
The A/V classification achieves a specificity of 92.9% with a sensitivity of
93.7% on the CT-DRIVE database compared to the state-of-the-art-specificity and
sensitivity, both of 91.7%.
Conclusion The results show that our method outperforms the leading previous
works on a public dataset for A/V classification and is by far the fastest.
Significance The proposed global AVR calculated on the whole fundus image
using our automatic A/V segmentation method can better track vessel changes
associated to diabetic retinopathy than the standard local AVR calculated only
around the optic disc.Comment: Preprint accepted in Artificial Intelligence in Medicin
Adaptive Temporal Encoding Network for Video Instance-level Human Parsing
Beyond the existing single-person and multiple-person human parsing tasks in
static images, this paper makes the first attempt to investigate a more
realistic video instance-level human parsing that simultaneously segments out
each person instance and parses each instance into more fine-grained parts
(e.g., head, leg, dress). We introduce a novel Adaptive Temporal Encoding
Network (ATEN) that alternatively performs temporal encoding among key frames
and flow-guided feature propagation from other consecutive frames between two
key frames. Specifically, ATEN first incorporates a Parsing-RCNN to produce the
instance-level parsing result for each key frame, which integrates both the
global human parsing and instance-level human segmentation into a unified
model. To balance between accuracy and efficiency, the flow-guided feature
propagation is used to directly parse consecutive frames according to their
identified temporal consistency with key frames. On the other hand, ATEN
leverages the convolution gated recurrent units (convGRU) to exploit temporal
changes over a series of key frames, which are further used to facilitate the
frame-level instance-level parsing. By alternatively performing direct feature
propagation between consistent frames and temporal encoding network among key
frames, our ATEN achieves a good balance between frame-level accuracy and time
efficiency, which is a common crucial problem in video object segmentation
research. To demonstrate the superiority of our ATEN, extensive experiments are
conducted on the most popular video segmentation benchmark (DAVIS) and a newly
collected Video Instance-level Parsing (VIP) dataset, which is the first video
instance-level human parsing dataset comprised of 404 sequences and over 20k
frames with instance-level and pixel-wise annotations.Comment: To appear in ACM MM 2018. Code link:
https://github.com/HCPLab-SYSU/ATEN. Dataset link: http://sysu-hcp.net/li
- …