9,759 research outputs found
Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation
Due to large variations in shape, appearance, and viewing conditions, object
recognition is a key precursory challenge in the fields of object manipulation
and robotic/AI visual reasoning in general. Recognizing object categories,
particular instances of objects and viewpoints/poses of objects are three
critical subproblems robots must solve in order to accurately grasp/manipulate
objects and reason about their environments. Multi-view images of the same
object lie on intrinsic low-dimensional manifolds in descriptor spaces (e.g.
visual/depth descriptor spaces). These object manifolds share the same topology
despite being geometrically different. Each object manifold can be represented
as a deformed version of a unified manifold. The object manifolds can thus be
parameterized by its homeomorphic mapping/reconstruction from the unified
manifold. In this work, we develop a novel framework to jointly solve the three
challenging recognition sub-problems, by explicitly modeling the deformations
of object manifolds and factorizing it in a view-invariant space for
recognition. We perform extensive experiments on several challenging datasets
and achieve state-of-the-art results
A Comprehensive Survey on Graph Neural Networks
Deep learning has revolutionized many machine learning tasks in recent years,
ranging from image classification and video processing to speech recognition
and natural language understanding. The data in these tasks are typically
represented in the Euclidean space. However, there is an increasing number of
applications where data are generated from non-Euclidean domains and are
represented as graphs with complex relationships and interdependency between
objects. The complexity of graph data has imposed significant challenges on
existing machine learning algorithms. Recently, many studies on extending deep
learning approaches for graph data have emerged. In this survey, we provide a
comprehensive overview of graph neural networks (GNNs) in data mining and
machine learning fields. We propose a new taxonomy to divide the
state-of-the-art graph neural networks into four categories, namely recurrent
graph neural networks, convolutional graph neural networks, graph autoencoders,
and spatial-temporal graph neural networks. We further discuss the applications
of graph neural networks across various domains and summarize the open source
codes, benchmark data sets, and model evaluation of graph neural networks.
Finally, we propose potential research directions in this rapidly growing
field.Comment: Minor revision (updated tables and references
Analysis of Railway Accidents' Narratives Using Deep Learning
Automatic understanding of domain specific texts in order to extract useful
relationships for later use is a non-trivial task. One such relationship would
be between railroad accidents' causes and their correspondent descriptions in
reports. From 2001 to 2016 rail accidents in the U.S. cost more than $4.6B.
Railroads involved in accidents are required to submit an accident report to
the Federal Railroad Administration (FRA). These reports contain a variety of
fixed field entries including primary cause of the accidents (a coded variable
with 389 values) as well as a narrative field which is a short text description
of the accident. Although these narratives provide more information than a
fixed field entry, the terminologies used in these reports are not easy to
understand by a non-expert reader. Therefore, providing an assisting method to
fill in the primary cause from such domain specific texts(narratives) would
help to label the accidents with more accuracy. Another important question for
transportation safety is whether the reported accident cause is consistent with
narrative description. To address these questions, we applied deep learning
methods together with powerful word embeddings such as Word2Vec and GloVe to
classify accident cause values for the primary cause field using the text in
the narratives. The results show that such approaches can both accurately
classify accident causes based on report narratives and find important
inconsistencies in accident reporting.Comment: accepted in IEEE International Conference on Machine Learning and
Applications (IEEE ICMLA
Text Classification Algorithms: A Survey
In recent years, there has been an exponential growth in the number of
complex documents and texts that require a deeper understanding of machine
learning methods to be able to accurately classify texts in many applications.
Many machine learning approaches have achieved surpassing results in natural
language processing. The success of these learning algorithms relies on their
capacity to understand complex models and non-linear relationships within data.
However, finding suitable structures, architectures, and techniques for text
classification is a challenge for researchers. In this paper, a brief overview
of text classification algorithms is discussed. This overview covers different
text feature extractions, dimensionality reduction methods, existing algorithms
and techniques, and evaluations methods. Finally, the limitations of each
technique and their application in the real-world problem are discussed
Geometry-Aware Recurrent Neural Networks for Active Visual Recognition
We present recurrent geometry-aware neural networks that integrate visual
information across multiple views of a scene into 3D latent feature tensors,
while maintaining an one-to-one mapping between 3D physical locations in the
world scene and latent feature locations. Object detection, object
segmentation, and 3D reconstruction is then carried out directly using the
constructed 3D feature memory, as opposed to any of the input 2D images. The
proposed models are equipped with differentiable egomotion-aware feature
warping and (learned) depth-aware unprojection operations to achieve
geometrically consistent mapping between the features in the input frame and
the constructed latent model of the scene. We empirically show the proposed
model generalizes much better than geometryunaware LSTM/GRU networks,
especially under the presence of multiple objects and cross-object occlusions.
Combined with active view selection policies, our model learns to select
informative viewpoints to integrate information from by "undoing" cross-object
occlusions, seamlessly combining geometry with learning from experience.Comment: To appear in NIPS201
User-Guided Aspect Classification for Domain-Specific Texts
Aspect classification, identifying aspects of text segments, facilitates
numerous applications, such as sentiment analysis and review summarization. To
alleviate the human effort on annotating massive texts, in this paper, we study
the problem of classifying aspects based on only a few user-provided seed words
for pre-defined aspects. The major challenge lies in how to handle the noisy
misc aspect, which is designed for texts without any pre-defined aspects. Even
domain experts have difficulties to nominate seed words for the misc aspect,
making existing seed-driven text classification methods not applicable. We
propose a novel framework, ARYA, which enables mutual enhancements between
pre-defined aspects and the misc aspect via iterative classifier training and
seed updating. Specifically, it trains a classifier for pre-defined aspects and
then leverages it to induce the supervision for the misc aspect. The prediction
results of the misc aspect are later utilized to filter out noisy seed words
for pre-defined aspects. Experiments in two domains demonstrate the superior
performance of our proposed framework, as well as the necessity and importance
of properly modeling the misc aspect
Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks
Human actions in video sequences are three-dimensional (3D) spatio-temporal
signals characterizing both the visual appearance and motion dynamics of the
involved humans and objects. Inspired by the success of convolutional neural
networks (CNN) for image classification, recent attempts have been made to
learn 3D CNNs for recognizing human actions in videos. However, partly due to
the high complexity of training 3D convolution kernels and the need for large
quantities of training videos, only limited success has been reported. This has
triggered us to investigate in this paper a new deep architecture which can
handle 3D signals more effectively. Specifically, we propose factorized
spatio-temporal convolutional networks (FstCN) that factorize the original 3D
convolution kernel learning as a sequential process of learning 2D spatial
kernels in the lower layers (called spatial convolutional layers), followed by
learning 1D temporal kernels in the upper layers (called temporal convolutional
layers). We introduce a novel transformation and permutation operator to make
factorization in FstCN possible. Moreover, to address the issue of sequence
alignment, we propose an effective training and inference strategy based on
sampling multiple video clips from a given action video sequence. We have
tested FstCN on two commonly used benchmark datasets (UCF-101 and HMDB-51).
Without using auxiliary training videos to boost the performance, FstCN
outperforms existing CNN based methods and achieves comparable performance with
a recent method that benefits from using auxiliary training videos
Semantic Instance Segmentation via Deep Metric Learning
We propose a new method for semantic instance segmentation, by first
computing how likely two pixels are to belong to the same object, and then by
grouping similar pixels together. Our similarity metric is based on a deep,
fully convolutional embedding model. Our grouping method is based on selecting
all points that are sufficiently similar to a set of "seed points", chosen from
a deep, fully convolutional scoring model. We show competitive results on the
Pascal VOC instance segmentation benchmark
ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition
We present ActionXPose, a novel 2D pose-based algorithm for posture-level
Human Action Recognition (HAR). The proposed approach exploits 2D human poses
provided by OpenPose detector from RGB videos. ActionXPose aims to process
poses data to be provided to a Long Short-Term Memory Neural Network and to a
1D Convolutional Neural Network, which solve the classification problem.
ActionXPose is one of the first algorithms that exploits 2D human poses for
HAR. The algorithm has real-time performance and it is robust to camera
movings, subject proximity changes, viewpoint changes, subject appearance
changes and provide high generalization degree. In fact, extensive simulations
show that ActionXPose can be successfully trained using different datasets at
once. State-of-the-art performance on popular datasets for posture-related HAR
problems (i3DPost, KTH) are provided and results are compared with those
obtained by other methods, including the selected ActionXPose baseline.
Moreover, we also proposed two novel datasets called MPOSE and ISLD recorded in
our Intelligent Sensing Lab, to show ActionXPose generalization performance
Warped-Linear Models for Time Series Classification
This article proposes and studies warped-linear models for time series
classification. The proposed models are time-warp invariant analogues of linear
models. Their construction is in line with time series averaging and extensions
of k-means and learning vector quantization to dynamic time warping (DTW)
spaces. The main theoretical result is that warped-linear models correspond to
polyhedral classifiers in Euclidean spaces. This result simplifies the analysis
of time-warp invariant models by reducing to max-linear functions. We exploit
this relationship and derive solutions to the label-dependency problem and the
problem of learning warped-linear models. Empirical results on time series
classification suggest that warped-linear functions better trade solution
quality against computation time than nearest-neighbor and prototype-based
methods
- …