34,449 research outputs found
Robust Visual Tracking via Convolutional Networks
Deep networks have been successfully applied to visual tracking by learning a
generic representation offline from numerous training images. However the
offline training is time-consuming and the learned generic representation may
be less discriminative for tracking specific objects. In this paper we present
that, even without offline training with a large amount of auxiliary data,
simple two-layer convolutional networks can be powerful enough to develop a
robust representation for visual tracking. In the first frame, we employ the
k-means algorithm to extract a set of normalized patches from the target region
as fixed filters, which integrate a series of adaptive contextual filters
surrounding the target to define a set of feature maps in the subsequent
frames. These maps measure similarities between each filter and the useful
local intensity patterns across the target, thereby encoding its local
structural information. Furthermore, all the maps form together a global
representation, which is built on mid-level features, thereby remaining close
to image-level information, and hence the inner geometric layout of the target
is also well preserved. A simple soft shrinkage method with an adaptive
threshold is employed to de-noise the global representation, resulting in a
robust sparse representation. The representation is updated via a simple and
effective online strategy, allowing it to robustly adapt to target appearance
variations. Our convolution networks have surprisingly lightweight structure,
yet perform favorably against several state-of-the-art methods on the CVPR2013
tracking benchmark dataset with 50 challenging videos
Discriminative Local Sparse Representations for Robust Face Recognition
A key recent advance in face recognition models a test face image as a sparse
linear combination of a set of training face images. The resulting sparse
representations have been shown to possess robustness against a variety of
distortions like random pixel corruption, occlusion and disguise. This approach
however makes the restrictive (in many scenarios) assumption that test faces
must be perfectly aligned (or registered) to the training data prior to
classification. In this paper, we propose a simple yet robust local block-based
sparsity model, using adaptively-constructed dictionaries from local features
in the training data, to overcome this misalignment problem. Our approach is
inspired by human perception: we analyze a series of local discriminative
features and combine them to arrive at the final classification decision. We
propose a probabilistic graphical model framework to explicitly mine the
conditional dependencies between these distinct sparse local features. In
particular, we learn discriminative graphs on sparse representations obtained
from distinct local slices of a face. Conditional correlations between these
sparse features are first discovered (in the training phase), and subsequently
exploited to bring about significant improvements in recognition rates.
Experimental results obtained on benchmark face databases demonstrate the
effectiveness of the proposed algorithms in the presence of multiple
registration errors (such as translation, rotation, and scaling) as well as
under variations of pose and illumination
A survey of sparse representation: algorithms and applications
Sparse representation has attracted much attention from researchers in fields
of signal processing, image processing, computer vision and pattern
recognition. Sparse representation also has a good reputation in both
theoretical research and practical applications. Many different algorithms have
been proposed for sparse representation. The main purpose of this article is to
provide a comprehensive study and an updated review on sparse representation
and to supply a guidance for researchers. The taxonomy of sparse representation
methods can be studied from various viewpoints. For example, in terms of
different norm minimizations used in sparsity constraints, the methods can be
roughly categorized into five groups: sparse representation with -norm
minimization, sparse representation with -norm (0p1) minimization,
sparse representation with -norm minimization and sparse representation
with -norm minimization. In this paper, a comprehensive overview of
sparse representation is provided. The available sparse representation
algorithms can also be empirically categorized into four groups: greedy
strategy approximation, constrained optimization, proximity algorithm-based
optimization, and homotopy algorithm-based sparse representation. The
rationales of different algorithms in each category are analyzed and a wide
range of sparse representation applications are summarized, which could
sufficiently reveal the potential nature of the sparse representation theory.
Specifically, an experimentally comparative study of these sparse
representation algorithms was presented. The Matlab code used in this paper can
be available at: http://www.yongxu.org/lunwen.html.Comment: Published on IEEE Access, Vol. 3, pp. 490-530, 201
Face Recognition: A Novel Multi-Level Taxonomy based Survey
In a world where security issues have been gaining growing importance, face
recognition systems have attracted increasing attention in multiple application
areas, ranging from forensics and surveillance to commerce and entertainment.
To help understanding the landscape and abstraction levels relevant for face
recognition systems, face recognition taxonomies allow a deeper dissection and
comparison of the existing solutions. This paper proposes a new, more
encompassing and richer multi-level face recognition taxonomy, facilitating the
organization and categorization of available and emerging face recognition
solutions; this taxonomy may also guide researchers in the development of more
efficient face recognition solutions. The proposed multi-level taxonomy
considers levels related to the face structure, feature support and feature
extraction approach. Following the proposed taxonomy, a comprehensive survey of
representative face recognition solutions is presented. The paper concludes
with a discussion on current algorithmic and application related challenges
which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If
accepted, the copy of record will be available at the IET Digital Librar
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
A Non-linear Differential CNN-Rendering Module for 3D Data Enhancement
In this work we introduce a differential rendering module which allows neural
networks to efficiently process cluttered data. The module is composed of
continuous piecewise differentiable functions defined as a sensor array of
cells embedded in 3D space. Our module is learnable and can be easily
integrated into neural networks allowing to optimize data rendering towards
specific learning tasks using gradient based methods in an end-to-end fashion.
Essentially, the module's sensor cells are allowed to transform independently
and locally focus and sense different parts of the 3D data. Thus, through their
optimization process, cells learn to focus on important parts of the data,
bypassing occlusions, clutter and noise. Since sensor cells originally lie on a
grid, this equals to a highly non-linear rendering of the scene into a 2D
image. Our module performs especially well in presence of clutter and
occlusions. Similarly, it deals well with non-linear deformations and improves
classification accuracy through proper rendering of the data. In our
experiments, we apply our module to demonstrate efficient localization and
classification tasks in cluttered data both 2D and 3D
Distributed Machine Learning in Materials that Couple Sensing, Actuation, Computation and Communication
This paper reviews machine learning applications and approaches to detection,
classification and control of intelligent materials and structures with
embedded distributed computation elements. The purpose of this survey is to
identify desired tasks to be performed in each type of material or structure
(e.g., damage detection in composites), identify and compare common approaches
to learning such tasks, and investigate models and training paradigms used.
Machine learning approaches and common temporal features used in the domains of
structural health monitoring, morphable aircraft, wearable computing and
robotic skins are explored. As the ultimate goal of this research is to
incorporate the approaches described in this survey into a robotic material
paradigm, the potential for adapting the computational models used in these
applications, and corresponding training algorithms, to an amorphous network of
computing nodes is considered. Distributed versions of support vector machines,
graphical models and mixture models developed in the field of wireless sensor
networks are reviewed. Potential areas of investigation, including possible
architectures for incorporating machine learning into robotic nodes, training
approaches, and the possibility of using deep learning approaches for automatic
feature extraction, are discussed
Hyperbox based machine learning algorithms: A comprehensive survey
With the rapid development of digital information, the data volume generated
by humans and machines is growing exponentially. Along with this trend, machine
learning algorithms have been formed and evolved continuously to discover new
information and knowledge from different data sources. Learning algorithms
using hyperboxes as fundamental representational and building blocks are a
branch of machine learning methods. These algorithms have enormous potential
for high scalability and online adaptation of predictors built using hyperbox
data representations to the dynamically changing environments and streaming
data. This paper aims to give a comprehensive survey of literature on
hyperbox-based machine learning models. In general, according to the
architecture and characteristic features of the resulting models, the existing
hyperbox-based learning algorithms may be grouped into three major categories:
fuzzy min-max neural networks, hyperbox-based hybrid models, and other
algorithms based on hyperbox representations. Within each of these groups, this
paper shows a brief description of the structure of models, associated learning
algorithms, and an analysis of their advantages and drawbacks. Main
applications of these hyperbox-based models to the real-world problems are also
described in this paper. Finally, we discuss some open problems and identify
potential future research directions in this field.Comment: 7 figure
Audio Surveillance: a Systematic Review
Despite surveillance systems are becoming increasingly ubiquitous in our
living environment, automated surveillance, currently based on video sensory
modality and machine intelligence, lacks most of the time the robustness and
reliability required in several real applications. To tackle this issue, audio
sensory devices have been taken into account, both alone or in combination with
video, giving birth, in the last decade, to a considerable amount of research.
In this paper audio-based automated surveillance methods are organized into a
comprehensive survey: a general taxonomy, inspired by the more widespread video
surveillance field, is proposed in order to systematically describe the methods
covering background subtraction, event classification, object tracking and
situation analysis. For each of these tasks, all the significant works are
reviewed, detailing their pros and cons and the context for which they have
been proposed. Moreover, a specific section is devoted to audio features,
discussing their expressiveness and their employment in the above described
tasks. Differently, from other surveys on audio processing and analysis, the
present one is specifically targeted to automated surveillance, highlighting
the target applications of each described methods and providing the reader
tables and schemes useful to retrieve the most suited algorithms for a specific
requirement
Fast Multi-class Dictionaries Learning with Geometrical Directions in MRI Reconstruction
Objective: Improve the reconstructed image with fast and multi-class
dictionaries learning when magnetic resonance imaging is accelerated by
undersampling the k-space data. Methods: A fast orthogonal dictionary learning
method is introduced into magnetic resonance image reconstruction to providing
adaptive sparse representation of images. To enhance the sparsity, image is
divided into classified patches according to the same geometrical direction and
dictionary is trained within each class. A new sparse reconstruction model with
the multi-class dictionaries is proposed and solved using a fast alternating
direction method of multipliers. Results: Experiments on phantom and brain
imaging data with acceleration factor up to 10 and various undersampling
patterns are conducted. The proposed method is compared with state-of-the-art
magnetic resonance image reconstruction methods. Conclusion: Artifacts are
better suppressed and image edges are better preserved than the compared
methods. Besides, the computation of the proposed approach is much faster than
the typical K-SVD dictionary learning method in magnetic resonance image
reconstruction. Significance: The proposed method can be exploited in
undersapmled magnetic resonance imaging to reduce data acquisition time and
reconstruct images with better image quality.Comment: 13 pages, 15 figures, 5 table
- …