112 research outputs found
Long-tailed Instance Segmentation using Gumbel Optimized Loss
Major advancements have been made in the field of object detection and
segmentation recently. However, when it comes to rare categories, the
state-of-the-art methods fail to detect them, resulting in a significant
performance gap between rare and frequent categories. In this paper, we
identify that Sigmoid or Softmax functions used in deep detectors are a major
reason for low performance and are sub-optimal for long-tailed detection and
segmentation. To address this, we develop a Gumbel Optimized Loss (GOL), for
long-tailed detection and segmentation. It aligns with the Gumbel distribution
of rare classes in imbalanced datasets, considering the fact that most classes
in long-tailed detection have low expected probability. The proposed GOL
significantly outperforms the best state-of-the-art method by 1.1% on AP , and
boosts the overall segmentation by 9.0% and detection by 8.0%, particularly
improving detection of rare classes by 20.3%, compared to Mask-RCNN, on LVIS
dataset. Code available at: https://github.com/kostas1515/GOLComment: ECCV202
Mean-Field methods for Structured Deep-Learning in Computer Vision
In recent years, Machine Learning based Computer Vision techniques made impressive progress. These algorithms proved particularly efficient for image classification or detection of isolated objects. From a probabilistic perspective, these methods can predict marginals, over single or multiple variables, independently, with high accuracy.
However, in many tasks of practical interest, we need to predict jointly several correlated variables.
Practical applications include people detection in crowded scenes, image segmentation, surface reconstruction, 3D pose estimation and others. A large part of the research effort in today's computer-vision community aims at finding task-specific solutions to these problems, while leveraging the power of Deep-Learning based classifiers. In this thesis, we present our journey towards a generic and practical solution based on mean-field (MF) inference.
Mean-field is a Statistical Physics-inspired method which has long been used in Computer-Vision as a variational approximation to posterior distributions over complex Conditional Random Fields. Standard
mean-field optimization is based on coordinate descent
and in many situations can be impractical.
We therefore propose a novel proximal gradient-based
approach to optimizing the variational objective. It
is naturally parallelizable and easy to implement.
We prove its convergence, and then demonstrate that, in
practice, it yields faster convergence and often finds better
optima than more traditional mean-field optimization techniques.
Then, we show that we can replace the fully factorized distribution of mean-field by a weighted mixture of such distributions, that similarly minimizes the KL-Divergence to the true posterior. Our extension of the clamping method proposed in previous works allows us to both produce a more descriptive approximation of the true posterior and, inspired by the diverse MAP paradigms, fit a mixture of mean-field approximations. We demonstrate that this positively impacts real-world algorithms that initially relied on mean-fields.
One of the important properties of the mean-field inference algorithms is that the closed-form updates are fully differentiable operations. This naturally allows to do parameter learning by simply unrolling multiple iterations of the updates, the so-called back-mean-field algorithm. We derive a novel and efficient structured learning method for multi-modal posterior distribution based on the Multi-Modal Mean-Field approximation, which can be seamlessly combined to modern gradient-based learning methods such as CNNs.
Finally, we explore in more details the specific problem of structured learning and prediction for multiple-people detection in crowded scenes. We then present a mean-field based structured deep-learning detection algorithm that provides state of the art results on this dataset
Automatic object classification for surveillance videos.
PhDThe recent popularity of surveillance video systems, specially located in urban
scenarios, demands the development of visual techniques for monitoring purposes.
A primary step towards intelligent surveillance video systems consists on automatic
object classification, which still remains an open research problem and the keystone
for the development of more specific applications.
Typically, object representation is based on the inherent visual features. However,
psychological studies have demonstrated that human beings can routinely categorise
objects according to their behaviour. The existing gap in the understanding
between the features automatically extracted by a computer, such as appearance-based
features, and the concepts unconsciously perceived by human beings but
unattainable for machines, or the behaviour features, is most commonly known
as semantic gap. Consequently, this thesis proposes to narrow the semantic gap
and bring together machine and human understanding towards object classification.
Thus, a Surveillance Media Management is proposed to automatically detect and
classify objects by analysing the physical properties inherent in their appearance
(machine understanding) and the behaviour patterns which require a higher level of
understanding (human understanding). Finally, a probabilistic multimodal fusion
algorithm bridges the gap performing an automatic classification considering both
machine and human understanding.
The performance of the proposed Surveillance Media Management framework
has been thoroughly evaluated on outdoor surveillance datasets. The experiments
conducted demonstrated that the combination of machine and human understanding
substantially enhanced the object classification performance. Finally, the inclusion
of human reasoning and understanding provides the essential information to bridge
the semantic gap towards smart surveillance video systems
Evolutionary and Swarm Algorithm Optimized Density- Based Clustering and Classification for Data Analytics
Clustering is one of the most widely used pattern recognition technologies for data analytics. Density-based clustering is a category of clustering methods which can find arbitrary shaped clusters. A well-known density-based clustering algorithm is Density- Based Spatial Clustering of Applications with Noise (DBSCAN). DBSCAN has three drawbacks: firstly, the parameters for DBSCAN are hard to set; secondly, the number of clusters cannot be controlled by the users; and thirdly, DBSCAN cannot directly be used as a classifier. With addressing the drawbacks of DBSCAN, a novel framework, Evolutionary and Swarm Algorithm optimised Density-based Clustering and Classification (ESA-DCC), is proposed. Evolutionary and Swarm Algorithm (ESA), has been applied in various different research fields regarding optimisation problems, including data analytics. Numerous categories of ESAs have been proposed, such as, Genetic Algorithms (GAs), Particle Swarm Optimization (PSO), Differential Evaluation (DE) and Artificial Bee Colony (ABC). In this thesis, ESA is used to search the best parameters of density-based clustering and classification in the ESA-DCC framework to address the first drawback of DBSCAN. As method to offset the second drawback, four types of fitness functions are defined to enable users to set the number of clusters as input. A supervised fitness function is defined to use the ESA-DCC as a classifier to address the third drawback. Four ESA- DCC methods, GA-DCC, PSO-DCC, DE-DCC and ABC-DCC, are developed. The performance of the ESA-DCC methods is compared with K-means and DBSCAN using ten datasets. The experimental results indicate that the proposed ESA-DCC methods can find the optimised parameters in both supervised and unsupervised contexts. The proposed methods are applied in a product recommender system and image segmentation cases
A Survey of Dataset Refinement for Problems in Computer Vision Datasets
Large-scale datasets have played a crucial role in the advancement of
computer vision. However, they often suffer from problems such as class
imbalance, noisy labels, dataset bias, or high resource costs, which can
inhibit model performance and reduce trustworthiness. With the advocacy of
data-centric research, various data-centric solutions have been proposed to
solve the dataset problems mentioned above. They improve the quality of
datasets by re-organizing them, which we call dataset refinement. In this
survey, we provide a comprehensive and structured overview of recent advances
in dataset refinement for problematic computer vision datasets. Firstly, we
summarize and analyze the various problems encountered in large-scale computer
vision datasets. Then, we classify the dataset refinement algorithms into three
categories based on the refinement process: data sampling, data subset
selection, and active learning. In addition, we organize these dataset
refinement methods according to the addressed data problems and provide a
systematic comparative description. We point out that these three types of
dataset refinement have distinct advantages and disadvantages for dataset
problems, which informs the choice of the data-centric method appropriate to a
particular research objective. Finally, we summarize the current literature and
propose potential future research topics.Comment: 33 pages, 10 figures, to be published in ACM Computing Survey
Unsupervised alignment of objects in images
With the advent of computer vision, various applications become interested to apply it to interpret the 3D and 2D scenes. The main core of computer vision is visual object detection which deals with detecting and representing objects in the image. Visual object detection requires to learn a model of each class type (e.g. car, cat) to be capable to detect objects belonging to the same class. Class learning benefits from a method which automatically aligns class examples making learning more straightforward.
The objective of this thesis is to further develop the sate-of-the-art feature-based alignment method which rigidly and automatically aligns object class images to a manually selected seed image. We try to compensate the weakness by providing a method to automatically select the best seed from dataset. Our method first extracts features by utilizing dense sampling method and then scale invariant feature transform (SIFT) descriptor is used to find best matches as initial local feature matches. The final alignment is based on spatial scoring procedure where the initial matches are refined to a set of spatially verified matches. The spatial score is used next to calculate similarity scores. We propose an algorithm which operates on spatial and similarity scores and finally selects the best seed. We also investigate the performance of step-wise alignment using minimum spanning tree (MST) and Dijkstra shortest path instead of direct alignment utilizing a single seed. We conduct our experiments using classes of Caltech-101 for which our unsupervised seed selection and step-wise alignment achieve state-of-the-art performance
Data Models for Dataset Drift Controls in Machine Learning With Images
Camera images are ubiquitous in machine learning research. They also play a
central role in the delivery of important services spanning medicine and
environmental surveying. However, the application of machine learning models in
these domains has been limited because of robustness concerns. A primary
failure mode are performance drops due to differences between the training and
deployment data. While there are methods to prospectively validate the
robustness of machine learning models to such dataset drifts, existing
approaches do not account for explicit models of the primary object of
interest: the data. This makes it difficult to create physically faithful drift
test cases or to provide specifications of data models that should be avoided
when deploying a machine learning model. In this study, we demonstrate how
these shortcomings can be overcome by pairing machine learning robustness
validation with physical optics. We examine the role raw sensor data and
differentiable data models can play in controlling performance risks related to
image dataset drift. The findings are distilled into three applications. First,
drift synthesis enables the controlled generation of physically faithful drift
test cases. The experiments presented here show that the average decrease in
model performance is ten to four times less severe than under post-hoc
augmentation testing. Second, the gradient connection between task and data
models allows for drift forensics that can be used to specify
performance-sensitive data models which should be avoided during deployment of
a machine learning model. Third, drift adjustment opens up the possibility for
processing adjustments in the face of drift. This can lead to speed up and
stabilization of classifier training at a margin of up to 20% in validation
accuracy. A guide to access the open code and datasets is available at
https://github.com/aiaudit-org/raw2logit.Comment: LO and MA contributed equall
- …