712 research outputs found
Face Recognition: From Traditional to Deep Learning Methods
Starting in the seventies, face recognition has become one of the most
researched topics in computer vision and biometrics. Traditional methods based
on hand-crafted features and traditional machine learning techniques have
recently been superseded by deep neural networks trained with very large
datasets. In this paper we provide a comprehensive and up-to-date literature
review of popular face recognition methods including both traditional
(geometry-based, holistic, feature-based and hybrid methods) and deep learning
methods
Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation
Unsupervised domain adaptation has received significant attention in recent
years. Most of existing works tackle the closed-set scenario, assuming that the
source and target domains share the exactly same categories. In practice,
nevertheless, a target domain often contains samples of classes unseen in
source domain (i.e., unknown class). The extension of domain adaptation from
closed-set to such open-set situation is not trivial since the target samples
in unknown class are not expected to align with the source. In this paper, we
address this problem by augmenting the state-of-the-art domain adaptation
technique, Self-Ensembling, with category-agnostic clusters in target domain.
Specifically, we present Self-Ensembling with Category-agnostic Clusters
(SE-CC) -- a novel architecture that steers domain adaptation with the
additional guidance of category-agnostic clusters that are specific to target
domain. These clustering information provides domain-specific visual cues,
facilitating the generalization of Self-Ensembling for both closed-set and
open-set scenarios. Technically, clustering is firstly performed over all the
unlabeled target samples to obtain the category-agnostic clusters, which reveal
the underlying data space structure peculiar to target domain. A clustering
branch is capitalized on to ensure that the learnt representation preserves
such underlying structure by matching the estimated assignment distribution
over clusters to the inherent cluster distribution for each target sample.
Furthermore, SE-CC enhances the learnt representation with mutual information
maximization. Extensive experiments are conducted on Office and VisDA datasets
for both open-set and closed-set domain adaptation, and superior results are
reported when comparing to the state-of-the-art approaches.Comment: CVPR 202
Deep Unsupervised Similarity Learning using Partially Ordered Sets
Unsupervised learning of visual similarities is of paramount importance to
computer vision, particularly due to lacking training data for fine-grained
similarities. Deep learning of similarities is often based on relationships
between pairs or triplets of samples. Many of these relations are unreliable
and mutually contradicting, implying inconsistencies when trained without
supervision information that relates different tuples or triplets to each
other. To overcome this problem, we use local estimates of reliable
(dis-)similarities to initially group samples into compact surrogate classes
and use local partial orders of samples to classes to link classes to each
other. Similarity learning is then formulated as a partial ordering task with
soft correspondences of all samples to classes. Adopting a strategy of
self-supervision, a CNN is trained to optimally represent samples in a mutually
consistent manner while updating the classes. The similarity learning and
grouping procedure are integrated in a single model and optimized jointly. The
proposed unsupervised approach shows competitive performance on detailed pose
estimation and object classification.Comment: Accepted for publication at IEEE Computer Vision and Pattern
Recognition 201
Discriminative Subnetworks with Regularized Spectral Learning for Global-state Network Data
Data mining practitioners are facing challenges from data with network
structure. In this paper, we address a specific class of global-state networks
which comprises of a set of network instances sharing a similar structure yet
having different values at local nodes. Each instance is associated with a
global state which indicates the occurrence of an event. The objective is to
uncover a small set of discriminative subnetworks that can optimally classify
global network values. Unlike most existing studies which explore an
exponential subnetwork space, we address this difficult problem by adopting a
space transformation approach. Specifically, we present an algorithm that
optimizes a constrained dual-objective function to learn a low-dimensional
subspace that is capable of discriminating networks labelled by different
global states, while reconciling with common network topology sharing across
instances. Our algorithm takes an appealing approach from spectral graph
learning and we show that the globally optimum solution can be achieved via
matrix eigen-decomposition.Comment: manuscript for the ECML 2014 pape
Self-Taught Hashing for Fast Similarity Search
The ability of fast similarity search at large scale is of great importance
to many Information Retrieval (IR) applications. A promising way to accelerate
similarity search is semantic hashing which designs compact binary codes for a
large number of documents so that semantically similar documents are mapped to
similar codes (within a short Hamming distance). Although some recently
proposed techniques are able to generate high-quality codes for documents known
in advance, obtaining the codes for previously unseen documents remains to be a
very challenging problem. In this paper, we emphasise this issue and propose a
novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the
optimal -bit binary codes for all documents in the given corpus via
unsupervised learning, and then train classifiers via supervised learning
to predict the -bit code for any query document unseen before. Our
experiments on three real-world text datasets show that the proposed approach
using binarised Laplacian Eigenmap (LapEig) and linear Support Vector Machine
(SVM) outperforms state-of-the-art techniques significantly
Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
This paper studies the unsupervised embedding learning problem, which
requires an effective similarity measurement between samples in low-dimensional
embedding space. Motivated by the positive concentrated and negative separated
properties observed from category-wise supervised learning, we propose to
utilize the instance-wise supervision to approximate these properties, which
aims at learning data augmentation invariant and instance spread-out features.
To achieve this goal, we propose a novel instance based softmax embedding
method, which directly optimizes the `real' instance features on top of the
softmax function. It achieves significantly faster learning speed and higher
accuracy than all existing methods. The proposed method performs well for both
seen and unseen testing categories with cosine similarity. It also achieves
competitive performance even without pre-trained network over samples from
fine-grained categories.Comment: CVPR 201
Simple and Complex Human Action Recognition in Constrained and Unconstrained Videos
Human action recognition plays a crucial role in visual learning applications such as video understanding and surveillance, video retrieval, human-computer interactions, and autonomous driving systems. A variety of methodologies have been proposed for human action recognition via developing of low-level features along with the bag-of-visual-word models. However, much less research has been performed on the compound of pre-processing, encoding and classification stages. This dissertation focuses on enhancing the action recognition performances via ensemble learning, hybrid classifier, hierarchical feature representation, and key action perception methodologies. Action variation is one of the crucial challenges in video analysis and action recognition. We address this problem by proposing the hybrid classifier (HC) to discriminate actions which contain similar forms of motion features such as walking, running, and jogging. Aside from that, we show and proof that the fusion of various appearance-based and motion features can boost the simple and complex action recognition performance. The next part of the dissertation introduces pooled-feature representation (PFR) which is derived from a double phase encoding framework (DPE). Considering that a given unconstrained video is composed of a sequence of simple frames, the first phase of DPE generates temporal sub-volumes from the video and represents them individually by employing the proposed improved rank pooling (IRP) method. The second phase constructs the pool of features by fusing the represented vectors from the first phase. The pool is compressed and then encoded to provide video-parts vector (VPV). The DPE framework allows distilling the video representation and hierarchically extracting new information. Compared with recent video encoding approaches, VPV can preserve the higher-level information through standard encoding of low-level features in two phases. Furthermore, the encoded vectors from both phases of DPE are fused along with a compression stage to develop PFR
Spatial-Temporal Relation Networks for Multi-Object Tracking
Recent progress in multiple object tracking (MOT) has shown that a robust
similarity score is key to the success of trackers. A good similarity score is
expected to reflect multiple cues, e.g. appearance, location, and topology,
over a long period of time. However, these cues are heterogeneous, making them
hard to be combined in a unified network. As a result, existing methods usually
encode them in separate networks or require a complex training approach. In
this paper, we present a unified framework for similarity measurement which
could simultaneously encode various cues and perform reasoning across both
spatial and temporal domains. We also study the feature representation of a
tracklet-object pair in depth, showing a proper design of the pair features can
well empower the trackers. The resulting approach is named spatial-temporal
relation networks (STRN). It runs in a feed-forward way and can be trained in
an end-to-end manner. The state-of-the-art accuracy was achieved on all of the
MOT15-17 benchmarks using public detection and online settings
Leveraging Distributional Semantics for Multi-Label Learning
We present a novel and scalable label embedding framework for large-scale
multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using
Distributional Semantics). Our approach draws inspiration from ideas rooted in
distributional semantics, specifically the Skip Gram Negative Sampling (SGNS)
approach, widely used to learn word embeddings for natural language processing
tasks. Learning such embeddings can be reduced to a certain matrix
factorization. Our approach is novel in that it highlights interesting
connections between label embedding methods used for multi-label learning and
paragraph/document embedding methods commonly used for learning representations
of text data. The framework can also be easily extended to incorporate
auxiliary information such as label-label correlations; this is crucial
especially when there are a lot of missing labels in the training data. We
demonstrate the effectiveness of our approach through an extensive set of
experiments on a variety of benchmark datasets, and show that the proposed
learning methods perform favorably compared to several baselines and
state-of-the-art methods for large-scale multi-label learning. To facilitate
end-to-end learning, we develop a joint learning algorithm that can learn the
embeddings as well as a regression model that predicts these embeddings given
input features, via efficient gradient-based methods.Comment: 10 Pages, 0 Figures, Missing Result Joint Learning Include
A Rapidly Deployable Classification System using Visual Data for the Application of Precision Weed Management
In this work we demonstrate a rapidly deployable weed classification system
that uses visual data to enable autonomous precision weeding without making
prior assumptions about which weed species are present in a given field.
Previous work in this area relies on having prior knowledge of the weed species
present in the field. This assumption cannot always hold true for every field,
and thus limits the use of weed classification systems based on this
assumption. In this work, we obviate this assumption and introduce a rapidly
deployable approach able to operate on any field without any weed species
assumptions prior to deployment. We present a three stage pipeline for the
implementation of our weed classification system consisting of initial field
surveillance, offline processing and selective labelling, and automated
precision weeding. The key characteristic of our approach is the combination of
plant clustering and selective labelling which is what enables our system to
operate without prior weed species knowledge. Testing using field data we are
able to label 12.3 times fewer images than traditional full labelling whilst
reducing classification accuracy by only 14%.Comment: 36 pages, 14 figures, published Computers and Electronics in
Agriculture Vol. 14
- …