1,008 research outputs found
ACNN: a Full Resolution DCNN for Medical Image Segmentation
Deep Convolutional Neural Networks (DCNNs) are used extensively in medical
image segmentation and hence 3D navigation for robot-assisted Minimally
Invasive Surgeries (MISs). However, current DCNNs usually use down sampling
layers for increasing the receptive field and gaining abstract semantic
information. These down sampling layers decrease the spatial dimension of
feature maps, which can be detrimental to image segmentation. Atrous
convolution is an alternative for the down sampling layer. It increases the
receptive field whilst maintains the spatial dimension of feature maps. In this
paper, a method for effective atrous rate setting is proposed to achieve the
largest and fully-covered receptive field with a minimum number of atrous
convolutional layers. Furthermore, a new and full resolution DCNN - Atrous
Convolutional Neural Network (ACNN), which incorporates cascaded atrous
II-blocks, residual learning and Instance Normalization (IN) is proposed.
Application results of the proposed ACNN to Magnetic Resonance Imaging (MRI)
and Computed Tomography (CT) image segmentation demonstrate that the proposed
ACNN can achieve higher segmentation Intersection over Unions (IoUs) than U-Net
and Deeplabv3+, but with reduced trainable parameters.Comment: 7 pages, 2 tables, 6 figure
Topical Behavior Prediction from Massive Logs
In this paper, we study the topical behavior in a large scale. We use the
network logs where each entry contains the entity ID, the timestamp, and the
meta data about the activity. Both the temporal and the spatial relationships
of the behavior are explored with the deep learning architectures combing the
recurrent neural network (RNN) and the convolutional neural network (CNN). To
make the behavioral data appropriate for the spatial learning in the CNN, we
propose several reduction steps to form the topical metrics and to place them
homogeneously like pixels in the images. The experimental result shows both
temporal and spatial gains when compared against a multilayer perceptron (MLP)
network. A new learning framework called the spatially connected convolutional
networks (SCCN) is introduced to predict the topical metrics more efficiently
Deep Regression for Face Alignment
In this paper, we present a deep regression approach for face alignment. The
deep architecture consists of a global layer and multi-stage local layers. We
apply the back-propagation algorithm with the dropout strategy to jointly
optimize the regression parameters. We show that the resulting deep regressor
gradually and evenly approaches the true facial landmarks stage by stage,
avoiding the tendency to yield over-strong early stage regressors while
over-weak later stage regressors. Experimental results show that our approach
achieves the state-of-the-ar
Generic Object Detection With Dense Neural Patterns and Regionlets
This paper addresses the challenge of establishing a bridge between deep
convolutional neural networks and conventional object detection frameworks for
accurate and efficient generic object detection. We introduce Dense Neural
Patterns, short for DNPs, which are dense local features derived from
discriminatively trained deep convolutional neural networks. DNPs can be easily
plugged into conventional detection frameworks in the same way as other dense
local features(like HOG or LBP). The effectiveness of the proposed approach is
demonstrated with the Regionlets object detection framework. It achieved 46.1%
mean average precision on the PASCAL VOC 2007 dataset, and 44.1% on the PASCAL
VOC 2010 dataset, which dramatically improves the original Regionlets approach
without DNPs
Face Detection through Scale-Friendly Deep Convolutional Networks
In this paper, we share our experience in designing a convolutional
network-based face detector that could handle faces of an extremely wide range
of scales. We show that faces with different scales can be modeled through a
specialized set of deep convolutional networks with different structures. These
detectors can be seamlessly integrated into a single unified network that can
be trained end-to-end. In contrast to existing deep models that are designed
for wide scale range, our network does not require an image pyramid input and
the model is of modest complexity. Our network, dubbed ScaleFace, achieves
promising performance on WIDER FACE and FDDB datasets with practical runtime
speed. Specifically, our method achieves 76.4 average precision on the
challenging WIDER FACE dataset and 96% recall rate on the FDDB dataset with 7
frames per second (fps) for 900 * 1300 input image.Comment: 12 pages, 10 figure
Faster Than Real-time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses
Facial alignment involves finding a set of landmark points on an image with a
known semantic meaning. However, this semantic meaning of landmark points is
often lost in 2D approaches where landmarks are either moved to visible
boundaries or ignored as the pose of the face changes. In order to extract
consistent alignment points across large poses, the 3D structure of the face
must be considered in the alignment step. However, extracting a 3D structure
from a single 2D image usually requires alignment in the first place. We
present our novel approach to simultaneously extract the 3D shape of the face
and the semantically consistent 2D alignment through a 3D Spatial Transformer
Network (3DSTN) to model both the camera projection matrix and the warping
parameters of a 3D model. By utilizing a generic 3D model and a Thin Plate
Spline (TPS) warping function, we are able to generate subject specific 3D
shapes without the need for a large 3D shape basis. In addition, our proposed
network can be trained in an end-to-end framework on entirely synthetic data
from the 300W-LP dataset. Unlike other 3D methods, our approach only requires
one pass through the network resulting in a faster than real-time alignment.
Evaluations of our model on the Annotated Facial Landmarks in the Wild (AFLW)
and AFLW2000-3D datasets show our method achieves state-of-the-art performance
over other 3D approaches to alignment.Comment: International Conference on Computer Vision (ICCV) 201
Unsupervised Learning of Dense Optical Flow, Depth and Egomotion from Sparse Event Data
In this work we present a lightweight, unsupervised learning pipeline for
\textit{dense} depth, optical flow and egomotion estimation from sparse event
output of the Dynamic Vision Sensor (DVS). To tackle this low level vision
task, we use a novel encoder-decoder neural network architecture - ECN.
Our work is the first monocular pipeline that generates dense depth and
optical flow from sparse event data only. The network works in self-supervised
mode and has just 150k parameters. We evaluate our pipeline on the MVSEC self
driving dataset and present results for depth, optical flow and and egomotion
estimation. Due to the lightweight design, the inference part of the network
runs at 250 FPS on a single GPU, making the pipeline ready for realtime
robotics applications. Our experiments demonstrate significant improvements
upon previous works that used deep learning on event data, as well as the
ability of our pipeline to perform well during both day and night
A Cascaded Learning Strategy for Robust COVID-19 Pneumonia Chest X-Ray Screening
We introduce a comprehensive screening platform for the COVID-19 (a.k.a.,
SARS-CoV-2) pneumonia. The proposed AI-based system works on chest x-ray (CXR)
images to predict whether a patient is infected with the COVID-19 disease.
Although the recent international joint effort on making the availability of
all sorts of open data, the public collection of CXR images is still relatively
small for reliably training a deep neural network (DNN) to carry out COVID-19
prediction. To better address such inefficiency, we design a cascaded learning
strategy to improve both the sensitivity and the specificity of the resulting
DNN classification model. Our approach leverages a large CXR image dataset of
non-COVID-19 pneumonia to generalize the original well-trained classification
model via a cascaded learning scheme. The resulting screening system is shown
to achieve good classification performance on the expanded dataset, including
those newly added COVID-19 CXR images.Comment: 14 pages, 6 figure
Convolutional Point-set Representation: A Convolutional Bridge Between a Densely Annotated Image and 3D Face Alignment
We present a robust method for estimating the facial pose and shape
information from a densely annotated facial image. The method relies on
Convolutional Point-set Representation (CPR), a carefully designed matrix
representation to summarize different layers of information encoded in the set
of detected points in the annotated image. The CPR disentangles the
dependencies of shape and different pose parameters and enables updating
different parameters in a sequential manner via convolutional neural networks
and recurrent layers. When updating the pose parameters, we sample reprojection
errors along with a predicted direction and update the parameters based on the
pattern of reprojection errors. This technique boosts the model's capability in
searching a local minimum under challenging scenarios. We also demonstrate that
annotation from different sources can be merged under the framework of CPR and
contributes to outperforming the current state-of-the-art solutions for 3D face
alignment. Experiments indicate the proposed CPRFA (CPR-based Face Alignment)
significantly improves 3D alignment accuracy when the densely annotated image
contains noise and missing values, which is common under "in-the-wild"
acquisition scenarios.Comment: Preprint Submitte
A Pursuit of Temporal Accuracy in General Activity Detection
Detecting activities in untrimmed videos is an important but challenging
task. The performance of existing methods remains unsatisfactory, e.g., they
often meet difficulties in locating the beginning and end of a long complex
action. In this paper, we propose a generic framework that can accurately
detect a wide variety of activities from untrimmed videos. Our first
contribution is a novel proposal scheme that can efficiently generate
candidates with accurate temporal boundaries. The other contribution is a
cascaded classification pipeline that explicitly distinguishes between
relevance and completeness of a candidate instance. On two challenging temporal
activity detection datasets, THUMOS14 and ActivityNet, the proposed framework
significantly outperforms the existing state-of-the-art methods, demonstrating
superior accuracy and strong adaptivity in handling activities with various
temporal structures
- …