562,917 research outputs found
Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation
We aim at segmenting small organs (e.g., the pancreas) from abdominal CT
scans. As the target often occupies a relatively small region in the input
image, deep neural networks can be easily confused by the complex and variable
background. To alleviate this, researchers proposed a coarse-to-fine approach,
which used prediction from the first (coarse) stage to indicate a smaller input
region for the second (fine) stage. Despite its effectiveness, this algorithm
dealt with two stages individually, which lacked optimizing a global energy
function, and limited its ability to incorporate multi-stage visual cues.
Missing contextual information led to unsatisfying convergence in iterations,
and that the fine stage sometimes produced even lower segmentation accuracy
than the coarse stage.
This paper presents a Recurrent Saliency Transformation Network. The key
innovation is a saliency transformation module, which repeatedly converts the
segmentation probability map from the previous iteration as spatial weights and
applies these weights to the current iteration. This brings us two-fold
benefits. In training, it allows joint optimization over the deep networks
dealing with different input scales. In testing, it propagates multi-stage
visual information throughout iterations to improve segmentation accuracy.
Experiments in the NIH pancreas segmentation dataset demonstrate the
state-of-the-art accuracy, which outperforms the previous best by an average of
over 2%. Much higher accuracies are also reported on several small organs in a
larger dataset collected by ourselves. In addition, our approach enjoys better
convergence properties, making it more efficient and reliable in practice.Comment: Accepted to CVPR 2018 (10 pages, 6 figures
Collaboration in sensor network research: an in-depth longitudinal analysis of assortative mixing patterns
Many investigations of scientific collaboration are based on statistical
analyses of large networks constructed from bibliographic repositories. These
investigations often rely on a wealth of bibliographic data, but very little or
no other information about the individuals in the network, and thus, fail to
illustrate the broader social and academic landscape in which collaboration
takes place. In this article, we perform an in-depth longitudinal analysis of a
relatively small network of scientific collaboration (N = 291) constructed from
the bibliographic record of a research center involved in the development and
application of sensor network and wireless technologies. We perform a
preliminary analysis of selected structural properties of the network,
computing its range, configuration and topology. We then support our
preliminary statistical analysis with an in-depth temporal investigation of the
assortative mixing of selected node characteristics, unveiling the researchers'
propensity to collaborate preferentially with others with a similar academic
profile. Our qualitative analysis of mixing patterns offers clues as to the
nature of the scientific community being modeled in relation to its
organizational, disciplinary, institutional, and international arrangements of
collaboration.Comment: Scientometrics (In press
Context-aware Synthesis for Video Frame Interpolation
Video frame interpolation algorithms typically estimate optical flow or its
variations and then use it to guide the synthesis of an intermediate frame
between two consecutive original frames. To handle challenges like occlusion,
bidirectional flow between the two input frames is often estimated and used to
warp and blend the input frames. However, how to effectively blend the two
warped frames still remains a challenging problem. This paper presents a
context-aware synthesis approach that warps not only the input frames but also
their pixel-wise contextual information and uses them to interpolate a
high-quality intermediate frame. Specifically, we first use a pre-trained
neural network to extract per-pixel contextual information for input frames. We
then employ a state-of-the-art optical flow algorithm to estimate bidirectional
flow between them and pre-warp both input frames and their context maps.
Finally, unlike common approaches that blend the pre-warped frames, our method
feeds them and their context maps to a video frame synthesis neural network to
produce the interpolated frame in a context-aware fashion. Our neural network
is fully convolutional and is trained end to end. Our experiments show that our
method can handle challenging scenarios such as occlusion and large motion and
outperforms representative state-of-the-art approaches.Comment: CVPR 2018, http://graphics.cs.pdx.edu/project/ctxsy
Benchmark Analysis of Representative Deep Neural Network Architectures
This work presents an in-depth analysis of the majority of the deep neural
networks (DNNs) proposed in the state of the art for image recognition. For
each DNN multiple performance indices are observed, such as recognition
accuracy, model complexity, computational complexity, memory usage, and
inference time. The behavior of such performance indices and some combinations
of them are analyzed and discussed. To measure the indices we experiment the
use of DNNs on two different computer architectures, a workstation equipped
with a NVIDIA Titan X Pascal and an embedded system based on a NVIDIA Jetson
TX1 board. This experimentation allows a direct comparison between DNNs running
on machines with very different computational capacity. This study is useful
for researchers to have a complete view of what solutions have been explored so
far and in which research directions are worth exploring in the future; and for
practitioners to select the DNN architecture(s) that better fit the resource
constraints of practical deployments and applications. To complete this work,
all the DNNs, as well as the software used for the analysis, are available
online.Comment: Will appear in IEEE Acces
Guided Stereo Matching
Stereo is a prominent technique to infer dense depth maps from images, and
deep learning further pushed forward the state-of-the-art, making end-to-end
architectures unrivaled when enough data is available for training. However,
deep networks suffer from significant drops in accuracy when dealing with new
environments. Therefore, in this paper, we introduce Guided Stereo Matching, a
novel paradigm leveraging a small amount of sparse, yet reliable depth
measurements retrieved from an external source enabling to ameliorate this
weakness. The additional sparse cues required by our method can be obtained
with any strategy (e.g., a LiDAR) and used to enhance features linked to
corresponding disparity hypotheses. Our formulation is general and fully
differentiable, thus enabling to exploit the additional sparse inputs in
pre-trained deep stereo networks as well as for training a new instance from
scratch. Extensive experiments on three standard datasets and two
state-of-the-art deep architectures show that even with a small set of sparse
input cues, i) the proposed paradigm enables significant improvements to
pre-trained networks. Moreover, ii) training from scratch notably increases
accuracy and robustness to domain shifts. Finally, iii) it is suited and
effective even with traditional stereo algorithms such as SGM.Comment: CVPR 201
- ā¦