4,841 research outputs found
Contour Detection from Deep Patch-level Boundary Prediction
In this paper, we present a novel approach for contour detection with
Convolutional Neural Networks. A multi-scale CNN learning framework is designed
to automatically learn the most relevant features for contour patch detection.
Our method uses patch-level measurements to create contour maps with
overlapping patches. We show the proposed CNN is able to to detect large-scale
contours in an image efficienly. We further propose a guided filtering method
to refine the contour maps produced from large-scale contours. Experimental
results on the major contour benchmark databases demonstrate the effectiveness
of the proposed technique. We show our method can achieve good detection of
both fine-scale and large-scale contours.Comment: IEEE International Conference on Signal and Image Processing 201
Exploring Context with Deep Structured models for Semantic Segmentation
State-of-the-art semantic image segmentation methods are mostly based on
training deep convolutional neural networks (CNNs). In this work, we proffer to
improve semantic segmentation with the use of contextual information. In
particular, we explore `patch-patch' context and `patch-background' context in
deep CNNs. We formulate deep structured models by combining CNNs and
Conditional Random Fields (CRFs) for learning the patch-patch context between
image regions. Specifically, we formulate CNN-based pairwise potential
functions to capture semantic correlations between neighboring patches.
Efficient piecewise training of the proposed deep structured model is then
applied in order to avoid repeated expensive CRF inference during the course of
back propagation. For capturing the patch-background context, we show that a
network design with traditional multi-scale image inputs and sliding pyramid
pooling is very effective for improving performance. We perform comprehensive
evaluation of the proposed method. We achieve new state-of-the-art performance
on a number of challenging semantic segmentation datasets including ,
-, , -, -,
-, and datasets. Particularly, we report an
intersection-over-union score of on the - dataset.Comment: 16 pages. Accepted to IEEE T. Pattern Analysis & Machine
Intelligence, 2017. Extended version of arXiv:1504.0101
Transfer Learning from Deep Features for Remote Sensing and Poverty Mapping
The lack of reliable data in developing countries is a major obstacle to
sustainable development, food security, and disaster relief. Poverty data, for
example, is typically scarce, sparse in coverage, and labor-intensive to
obtain. Remote sensing data such as high-resolution satellite imagery, on the
other hand, is becoming increasingly available and inexpensive. Unfortunately,
such data is highly unstructured and currently no techniques exist to
automatically extract useful insights to inform policy decisions and help
direct humanitarian efforts. We propose a novel machine learning approach to
extract large-scale socioeconomic indicators from high-resolution satellite
imagery. The main challenge is that training data is very scarce, making it
difficult to apply modern techniques such as Convolutional Neural Networks
(CNN). We therefore propose a transfer learning approach where nighttime light
intensities are used as a data-rich proxy. We train a fully convolutional CNN
model to predict nighttime lights from daytime imagery, simultaneously learning
features that are useful for poverty prediction. The model learns filters
identifying different terrains and man-made structures, including roads,
buildings, and farmlands, without any supervision beyond nighttime lights. We
demonstrate that these learned features are highly informative for poverty
mapping, even approaching the predictive performance of survey data collected
in the field.Comment: In Proc. 30th AAAI Conference on Artificial Intelligenc
Articulated Clinician Detection Using 3D Pictorial Structures on RGB-D Data
Reliable human pose estimation (HPE) is essential to many clinical
applications, such as surgical workflow analysis, radiation safety monitoring
and human-robot cooperation. Proposed methods for the operating room (OR) rely
either on foreground estimation using a multi-camera system, which is a
challenge in real ORs due to color similarities and frequent illumination
changes, or on wearable sensors or markers, which are invasive and therefore
difficult to introduce in the room. Instead, we propose a novel approach based
on Pictorial Structures (PS) and on RGB-D data, which can be easily deployed in
real ORs. We extend the PS framework in two ways. First, we build robust and
discriminative part detectors using both color and depth images. We also
present a novel descriptor for depth images, called histogram of depth
differences (HDD). Second, we extend PS to 3D by proposing 3D pairwise
constraints and a new method that makes exact inference tractable. Our approach
is evaluated for pose estimation and clinician detection on a challenging RGB-D
dataset recorded in a busy operating room during live surgeries. We conduct
series of experiments to study the different part detectors in conjunction with
the various 2D or 3D pairwise constraints. Our comparisons demonstrate that 3D
PS with RGB-D part detectors significantly improves the results in a visually
challenging operating environment.Comment: The supplementary video is available at https://youtu.be/iabbGSqRSg
PEA265: Perceptual Assessment of Video Compression Artifacts
The most widely used video encoders share a common hybrid coding framework
that includes block-based motion estimation/compensation and block-based
transform coding. Despite their high coding efficiency, the encoded videos
often exhibit visually annoying artifacts, denoted as Perceivable Encoding
Artifacts (PEAs), which significantly degrade the visual Qualityof- Experience
(QoE) of end users. To monitor and improve visual QoE, it is crucial to develop
subjective and objective measures that can identify and quantify various types
of PEAs. In this work, we make the first attempt to build a large-scale
subjectlabelled database composed of H.265/HEVC compressed videos containing
various PEAs. The database, namely the PEA265 database, includes 4 types of
spatial PEAs (i.e. blurring, blocking, ringing and color bleeding) and 2 types
of temporal PEAs (i.e. flickering and floating). Each containing at least
60,000 image or video patches with positive and negative labels. To objectively
identify these PEAs, we train Convolutional Neural Networks (CNNs) using the
PEA265 database. It appears that state-of-theart ResNeXt is capable of
identifying each type of PEAs with high accuracy. Furthermore, we define PEA
pattern and PEA intensity measures to quantify PEA levels of compressed video
sequence. We believe that the PEA265 database and our findings will benefit the
future development of video quality assessment methods and perceptually
motivated video encoders.Comment: 10 pages,15 figures,4 table
- …