9,102 research outputs found
Semantic Instance Segmentation with a Discriminative Loss Function
Semantic instance segmentation remains a challenging task. In this work we
propose to tackle the problem with a discriminative loss function, operating at
the pixel level, that encourages a convolutional network to produce a
representation of the image that can easily be clustered into instances with a
simple post-processing step. The loss function encourages the network to map
each pixel to a point in feature space so that pixels belonging to the same
instance lie close together while different instances are separated by a wide
margin. Our approach of combining an off-the-shelf network with a principled
loss function inspired by a metric learning objective is conceptually simple
and distinct from recent efforts in instance segmentation. In contrast to
previous works, our method does not rely on object proposals or recurrent
mechanisms. A key contribution of our work is to demonstrate that such a simple
setup without bells and whistles is effective and can perform on par with more
complex methods. Moreover, we show that it does not suffer from some of the
limitations of the popular detect-and-segment approaches. We achieve
competitive performance on the Cityscapes and CVPPP leaf segmentation
benchmarks.Comment: Published at "Deep Learning for Robotic Vision", workshop at CVPR
201
Combining Discrete and Neural Features for Sequence Labeling
Neural network models have recently received heated research attention in the
natural language processing community. Compared with traditional models with
discrete features, neural models have two main advantages. First, they take
low-dimensional, real-valued embedding vectors as inputs, which can be trained
over large raw data, thereby addressing the issue of feature sparsity in
discrete models. Second, deep neural networks can be used to automatically
combine input features, and including non-local features that capture semantic
patterns that cannot be expressed using discrete indicator features. As a
result, neural network models have achieved competitive accuracies compared
with the best discrete models for a range of NLP tasks.
On the other hand, manual feature templates have been carefully investigated
for most NLP tasks over decades and typically cover the most useful indicator
pattern for solving the problems. Such information can be complementary the
features automatically induced from neural networks, and therefore combining
discrete and neural features can potentially lead to better accuracy compared
with models that leverage discrete or neural features only.
In this paper, we systematically investigate the effect of discrete and
neural feature combination for a range of fundamental NLP tasks based on
sequence labeling, including word segmentation, POS tagging and named entity
recognition for Chinese and English, respectively. Our results on standard
benchmarks show that state-of-the-art neural models can give accuracies
comparable to the best discrete models in the literature for most tasks and
combing discrete and neural features unanimously yield better results.Comment: Accepted by International Conference on Computational Linguistics and
Intelligent Text Processing (CICLing) 2016, Apri
ROSA: Robust Salient Object Detection against Adversarial Attacks
Recently salient object detection has witnessed remarkable improvement owing
to the deep convolutional neural networks which can harvest powerful features
for images. In particular, state-of-the-art salient object detection methods
enjoy high accuracy and efficiency from fully convolutional network (FCN) based
frameworks which are trained from end to end and predict pixel-wise labels.
However, such framework suffers from adversarial attacks which confuse neural
networks via adding quasi-imperceptible noises to input images without changing
the ground truth annotated by human subjects. To our knowledge, this paper is
the first one that mounts successful adversarial attacks on salient object
detection models and verifies that adversarial samples are effective on a wide
range of existing methods. Furthermore, this paper proposes a novel end-to-end
trainable framework to enhance the robustness for arbitrary FCN-based salient
object detection models against adversarial attacks. The proposed framework
adopts a novel idea that first introduces some new generic noise to destroy
adversarial perturbations, and then learns to predict saliency maps for input
images with the introduced noise. Specifically, our proposed method consists of
a segment-wise shielding component, which preserves boundaries and destroys
delicate adversarial noise patterns and a context-aware restoration component,
which refines saliency maps through global contrast modeling. Experimental
results suggest that our proposed framework improves the performance
significantly for state-of-the-art models on a series of datasets.Comment: To be published in IEEE Transactions on Cybernetic
Pedestrian Detection with Autoregressive Network Phases
We present an autoregressive pedestrian detection framework with cascaded
phases designed to progressively improve precision. The proposed framework
utilizes a novel lightweight stackable decoder-encoder module which uses
convolutional re-sampling layers to improve features while maintaining
efficient memory and runtime cost. Unlike previous cascaded detection systems,
our proposed framework is designed within a region proposal network and thus
retains greater context of nearby detections compared to independently
processed RoI systems. We explicitly encourage increasing levels of precision
by assigning strict labeling policies to each consecutive phase such that early
phases develop features primarily focused on achieving high recall and later on
accurate precision. In consequence, the final feature maps form more peaky
radial gradients emulating from the centroids of unique pedestrians. Using our
proposed autoregressive framework leads to new state-of-the-art performance on
the reasonable and occlusion settings of the Caltech pedestrian dataset, and
achieves competitive state-of-the-art performance on the KITTI dataset
A Comparative Study of Fruit Detection and Counting Methods for Yield Mapping in Apple Orchards
We present new methods for apple detection and counting based on recent deep
learning approaches and compare them with state-of-the-art results based on
classical methods. Our goal is to quantify performance improvements by neural
network-based methods compared to methods based on classical approaches.
Additionally, we introduce a complete system for counting apples in an entire
row. This task is challenging as it requires tracking fruits in images from
both sides of the row. We evaluate the performances of three fruit detection
methods and two fruit counting methods on six datasets. Results indicate that
the classical detection approach still outperforms the deep learning based
methods in the majority of the datasets. For fruit counting though, the deep
learning based approach performs better for all of the datasets. Combining the
classical detection method together with the neural network based counting
approach, we achieve remarkable yield accuracies ranging from 95.56% to 97.83%.Comment: 28 page
svcR: An R Package for Support Vector Clustering improved with Geometric Hashing applied to Lexical Pattern Discovery
We present a new R package which takes a numerical matrix format as data
input, and computes clusters using a support vector clustering method (SVC). We
have implemented an original 2D-grid labeling approach to speed up cluster
extraction. In this sense, SVC can be seen as an efficient cluster extraction
if clusters are separable in a 2-D map. Secondly we showed that this SVC
approach using a Jaccard-Radial base kernel can help to classify well enough a
set of terms into ontological classes and help to define regular expression
rules for information extraction in documents; our case study concerns a set of
terms and documents about developmental and molecular biology
Deep Multi-Center Learning for Face Alignment
Facial landmarks are highly correlated with each other since a certain
landmark can be estimated by its neighboring landmarks. Most of the existing
deep learning methods only use one fully-connected layer called shape
prediction layer to estimate the locations of facial landmarks. In this paper,
we propose a novel deep learning framework named Multi-Center Learning with
multiple shape prediction layers for face alignment. In particular, each shape
prediction layer emphasizes on the detection of a certain cluster of
semantically relevant landmarks respectively. Challenging landmarks are focused
firstly, and each cluster of landmarks is further optimized respectively.
Moreover, to reduce the model complexity, we propose a model assembling method
to integrate multiple shape prediction layers into one shape prediction layer.
Extensive experiments demonstrate that our method is effective for handling
complex occlusions and appearance variations with real-time performance. The
code for our method is available at
https://github.com/ZhiwenShao/MCNet-Extension.Comment: This paper has been accepted by Neurocomputin
Learning Representations for Automatic Colorization
We develop a fully automatic image colorization system. Our approach
leverages recent advances in deep networks, exploiting both low-level and
semantic representations. As many scene elements naturally appear according to
multimodal color distributions, we train our model to predict per-pixel color
histograms. This intermediate output can be used to automatically generate a
color image, or further manipulated prior to image formation. On both fully and
partially automatic colorization tasks, we outperform existing methods. We also
explore colorization as a vehicle for self-supervised visual representation
learning.Comment: ECCV 2016 (Project page:
http://people.cs.uchicago.edu/~larsson/colorization/
An Improved Phrase-based Approach to Annotating and Summarizing Student Course Responses
Teaching large classes remains a great challenge, primarily because it is
difficult to attend to all the student needs in a timely manner. Automatic text
summarization systems can be leveraged to summarize the student feedback,
submitted immediately after each lecture, but it is left to be discovered what
makes a good summary for student responses. In this work we explore a new
methodology that effectively extracts summary phrases from the student
responses. Each phrase is tagged with the number of students who raise the
issue. The phrases are evaluated along two dimensions: with respect to text
content, they should be informative and well-formed, measured by the ROUGE
metric; additionally, they shall attend to the most pressing student needs,
measured by a newly proposed metric. This work is enabled by a phrase-based
annotation and highlighting scheme, which is new to the summarization task. The
phrase-based framework allows us to summarize the student responses into a set
of bullet points and present to the instructor promptly.Comment: 11 page
graph2vec: Learning Distributed Representations of Graphs
Recent works on representation learning for graph structured data
predominantly focus on learning distributed representations of graph
substructures such as nodes and subgraphs. However, many graph analytics tasks
such as graph classification and clustering require representing entire graphs
as fixed length feature vectors. While the aforementioned approaches are
naturally unequipped to learn such representations, graph kernels remain as the
most effective way of obtaining them. However, these graph kernels use
handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are
hampered by problems such as poor generalization. To address this limitation,
in this work, we propose a neural embedding framework named graph2vec to learn
data-driven distributed representations of arbitrary sized graphs. graph2vec's
embeddings are learnt in an unsupervised manner and are task agnostic. Hence,
they could be used for any downstream task such as graph classification,
clustering and even seeding supervised representation learning approaches. Our
experiments on several benchmark and large real-world datasets show that
graph2vec achieves significant improvements in classification and clustering
accuracies over substructure representation learning approaches and are
competitive with state-of-the-art graph kernels
- …