118 research outputs found
The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection
Where am I? This is one of the most critical questions that any intelligent
system should answer to decide whether it navigates to a previously visited
area. This problem has long been acknowledged for its challenging nature in
simultaneous localization and mapping (SLAM), wherein the robot needs to
correctly associate the incoming sensory data to the database allowing
consistent map generation. The significant advances in computer vision achieved
over the last 20 years, the increased computational power, and the growing
demand for long-term exploration contributed to efficiently performing such a
complex task with inexpensive perception sensors. In this article, visual loop
closure detection, which formulates a solution based solely on appearance input
data, is surveyed. We start by briefly introducing place recognition and SLAM
concepts in robotics. Then, we describe a loop closure detection system's
structure, covering an extensive collection of topics, including the feature
extraction, the environment representation, the decision-making step, and the
evaluation process. We conclude by discussing open and new research challenges,
particularly concerning the robustness in dynamic environments, the
computational complexity, and scalability in long-term operations. The article
aims to serve as a tutorial and a position paper for newcomers to visual loop
closure detection.Comment: 25 pages, 15 figure
Ranking and Retrieval under Semantic Relevance
This thesis presents a series of conceptual and empirical developments on the ranking and retrieval of candidates under semantic relevance. Part I of the thesis introduces the concept of uncertainty in various semantic tasks (such as recognizing textual entailment) in natural language processing, and the machine learning techniques commonly employed to model these semantic phenomena. A unified view of ranking and retrieval will be presented, and the trade-off between model expressiveness, performance, and scalability in model design will be discussed.
Part II of the thesis focuses on applying these ranking and retrieval techniques to text: Chapter 3 examines the feasibility of ranking hypotheses given a premise with respect to a human's subjective probability of the hypothesis happening, effectively extending the traditional categorical task of natural language inference. Chapter 4 focuses on detecting situation frames for documents using ranking methods. Then we extend the ranking notion to retrieval, and develop both sparse (Chapter 5) and dense (Chapter 6) vector-based methods to facilitate scalable retrieval for potential answer paragraphs in question answering.
Part III turns the focus to mentions and entities in text, while continuing the theme on ranking and retrieval: Chapter 7 discusses the ranking of fine-grained types that an entity mention could belong to, leading to state-of-the-art performance on hierarchical multi-label fine-grained entity typing. Chapter 8 extends the semantic relation of coreference to a cross-document setting, enabling models to retrieve from a large corpus, instead of in a single document, when resolving coreferent entity mentions
Out-of-vocabulary spoken term detection
Spoken term detection (STD) is a fundamental task for multimedia information
retrieval. A major challenge faced by an STD system is the serious performance reduction
when detecting out-of-vocabulary (OOV) terms. The difficulties arise not only
from the absence of pronunciations for such terms in the system dictionaries, but from
intrinsic uncertainty in pronunciations, significant diversity in term properties and a
high degree of weakness in acoustic and language modelling.
To tackle the OOV issue, we first applied the joint-multigram model to predict pronunciations
for OOV terms in a stochastic way. Based on this, we propose a stochastic
pronunciation model that considers all possible pronunciations for OOV terms so that
the high pronunciation uncertainty is compensated for.
Furthermore, to deal with the diversity in term properties, we propose a termdependent
discriminative decision strategy, which employs discriminative models to
integrate multiple informative factors and confidence measures into a classification
probability, which gives rise to minimum decision cost.
In addition, to address the weakness in acoustic and language modelling, we propose
a direct posterior confidence measure which replaces the generative models with
a discriminative model, such as a multi-layer perceptron (MLP), to obtain a robust
confidence for OOV term detection.
With these novel techniques, the STD performance on OOV terms was improved
substantially and significantly in our experiments set on meeting speech data
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Recommended from our members
Scalable Machine Learning for Visual Data
Recent years have seen a rapid growth of visual data produced by social media, large-scale surveillance cameras, biometrics sensors, and mass media content providers. The unprecedented availability of visual data calls for machine learning methods that are effective and efficient for such large-scale settings.
The input of any machine learning algorithm consists of data and supervision. In a large-scale setting, on the one hand, the data often comes with a large number of samples, each with high dimensionality. On the other hand, the unconstrained visual data requires a large amount of supervision to make machine learning methods effective. However, the supervised information is often limited and expensive to acquire. The above hinder the applicability of machine learning methods for large-scale visual data. In the thesis, we propose innovative approaches to scale up machine learning to address challenges arising from both the scale of the data and the limitation of the supervision. The methods are developed with a special focus on visual data, yet they are also widely applicable to other domains that require scalable machine learning methods.
Learning with high-dimensionality:
The "large-scale" of visual data comes not only from the number of samples but also from the dimensionality of the features. While a considerable amount of effort has been spent on making machine learning scalable for more samples, few approaches are addressing learning with high-dimensional data. In Part I, we propose an innovative solution for learning with very high-dimensional data. Specifically, we use a special structure, the circulant structure, to speed up linear projection, the most widely used operation in machine learning. The special structure dramatically improves the space complexity from quadratic to linear, and the computational complexity from quadratic to linearithmic in terms of the feature dimension. The proposed approach is successfully applied in various frameworks of large-scale visual data analysis, including binary embedding, deep neural networks, and kernel approximation. The significantly improved efficiency is achieved with minimal loss of the performance. For all the applications, we further propose to optimize the projection parameters with training data to further improve the performance.
The scalability of learning algorithms is often fundamentally limited by the amount of supervision available. The massive visual data comes unstructured, with diverse distribution and high-dimensionality -- it is required to have a large amount of supervised information for the learning methods to work. Unfortunately, it is difficult, and sometimes even impossible to collect a sufficient amount of high-quality supervision, such as instance-by-instance labels, or frame-by-frame annotations of the videos.
Learning from label proportions:
To address the challenge, we need to design algorithms utilizing new types of supervision, often presented in weak forms, such as relatedness between classes, and label statistics over the groups. In Part II, we study a learning setting called Learning from Label Proportions (LLP), where the training data is provided in groups, and only the proportion of each class in each group is known. The task is to learn a model to predict the class labels of the individuals. Besides computer vision, this learning setting has broad applications in social science, marketing, and healthcare, where individual-level labels cannot be obtained due to privacy concerns. We provide theoretical analysis under an intuitive framework called Empirical Proportion Risk Minimization (EPRM), which learns an instance level classifier to match the given label proportions on the training data. The analysis answers the fundamental question, when and why LLP is possible. Under EPRM, we propose the proportion-SVM (∝SVM) algorithm, which jointly optimizes the latent instance labels and the classification model in a large-margin framework. The approach avoids making restrictive assumptions on the data, leading to the state-of-the-art results. We have successfully applied the developed tools to challenging problems in computer vision including instance-based event recognition, and attribute modeling.
Scaling up mid-level visual attributes:
Besides learning with weak supervision, the limitation on the supervision can also be alleviated by leveraging the knowledge from different, yet related tasks. Specifically, "visual attributes" have been extensively studied in computer vision. The idea is that the attributes, which can be understood as models trained to recognize visual properties can be leveraged in recognizing novel categories (being able to recognize green and orange is helpful for recognizing apple). In a large-scale setting, the unconstrained visual data requires a high-dimensional attribute space that is sufficiently expressive for the visual world. Ironically, though designed to improve the scalability of visual recognition, conventional attribute modeling requires expensive human efforts for labeling the detailed attributes and is inadequate for designing and learning a large set of attributes. To address such challenges, in Part III, we propose methods that can be used to automatically design a large set of attribute models, without user labeling burdens. We propose weak attribute, which combines various types of existing recognition models to form an expressive space for visual recognition and retrieval. In addition, we develop category-level attribute to characterize distinct properties separating multiple categories. The attributes are optimized to be discriminative to the visual recognition task over known categories, providing both better efficiency and higher recognition rate over novel categories with a limited number of training samples
Biometric Systems
Biometric authentication has been widely used for access control and security systems over the past few years. The purpose of this book is to provide the readers with life cycle of different biometric authentication systems from their design and development to qualification and final application. The major systems discussed in this book include fingerprint identification, face recognition, iris segmentation and classification, signature verification and other miscellaneous systems which describe management policies of biometrics, reliability measures, pressure based typing and signature verification, bio-chemical systems and behavioral characteristics. In summary, this book provides the students and the researchers with different approaches to develop biometric authentication systems and at the same time includes state-of-the-art approaches in their design and development. The approaches have been thoroughly tested on standard databases and in real world applications
- …