88 research outputs found
MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for Facial Expression Recognition
Cutting-edge research in facial expression recognition (FER) currently favors
the utilization of convolutional neural networks (CNNs) backbone which is
supervisedly pre-trained on face recognition datasets for feature extraction.
However, due to the vast scale of face recognition datasets and the high cost
associated with collecting facial labels, this pre-training paradigm incurs
significant expenses. Towards this end, we propose to pre-train vision
Transformers (ViTs) through a self-supervised approach on a mid-scale general
image dataset. In addition, when compared with the domain disparity existing
between face datasets and FER datasets, the divergence between general datasets
and FER datasets is more pronounced. Therefore, we propose a contrastive
fine-tuning approach to effectively mitigate this domain disparity.
Specifically, we introduce a novel FER training paradigm named Mask Image
pre-training with MIx Contrastive fine-tuning (MIMIC). In the initial phase, we
pre-train the ViT via masked image reconstruction on general images.
Subsequently, in the fine-tuning stage, we introduce a mix-supervised
contrastive learning process, which enhances the model with a more extensive
range of positive samples by the mixing strategy. Through extensive experiments
conducted on three benchmark datasets, we demonstrate that our MIMIC
outperforms the previous training paradigm, showing its capability to learn
better representations. Remarkably, the results indicate that the vanilla ViT
can achieve impressive performance without the need for intricate,
auxiliary-designed modules. Moreover, when scaling up the model size, MIMIC
exhibits no performance saturation and is superior to the current
state-of-the-art methods
Expressivity in Natural and Artificial Systems
Roboticists are trying to replicate animal behavior in artificial systems.
Yet, quantitative bounds on capacity of a moving platform (natural or
artificial) to express information in the environment are not known. This paper
presents a measure for the capacity of motion complexity -- the expressivity --
of articulated platforms (both natural and artificial) and shows that this
measure is stagnant and unexpectedly limited in extant robotic systems. This
analysis indicates trends in increasing capacity in both internal and external
complexity for natural systems while artificial, robotic systems have increased
significantly in the capacity of computational (internal) states but remained
more or less constant in mechanical (external) state capacity. This work
presents a way to analyze trends in animal behavior and shows that robots are
not capable of the same multi-faceted behavior in rich, dynamic environments as
natural systems.Comment: Rejected from Nature, after review and appeal, July 4, 2018
(submitted May 11, 2018
Table of Contents
Table of Contents with links to the conference papers
Action for perception : active object recognition and pose estimation in cluttered environments
University of Technology Sydney. Faculty of Engineering and Information Technology.Object recognition and localisation are indispensable competency for service robots in everyday environments like offices and kitchens. Presence of similar objects that can only be differentiated from a small part of the surface together with clutter that leads to occlusions make it impossible to detect target objects accurately and reliably from a single observation. When the sensor observing the environment is mounted on a mobile platform, object detection and pose estimation can be facilitated by observing the environment from a series of different viewpoints. Computing Active perception strategies, with the aim of finding optimal actions to enhance object recognition and pose estimation performance is the focus of this thesis.
This thesis consists of two main parts:
In the first part, it focuses on object detection and pose estimation from a single frame of observation. Using an RGB-D sensor, we propose a modular 3D textured object detection and pose estimation framework which can recognise object under cluttered environment by taking advantage of the geometric information provided from the sensor. To handle less-textured objects and objects under severe illumination conditions, we propose a novel RGB-D feature which is robust to illumination, scale, rotation and viewpoint variations, and provides reliable feature matching results under challenging conditions. The proposed feature is validated for multiple applications including object detection and point cloud alignment. Parts of the above approaches are integrated with existing work to produce a practical and effective perception module for a warehouse automation task. The designed perception system can detect objects of different types and estimate their poses robustly thus guaranteeing a reliable object grasping and manipulation performances.
In the second part of the thesis, we investigate the problem of active object detection and pose estimation from two perspectives: with and without considering the uncertainties in the motion model and the observation model. First, we propose a model-driven active object recognition and pose estimation system via exploiting the feature association probability under scale and viewpoint variations. By explicitly modelling the feature association, the proposed system can predict future information more accurately thus laying the foundation of a successful active Next-Best-View planning system even with a naive greedy search technique. We also present a probabilistic framework which handles motion and observation uncertainties in the active object detection and pose estimation problem. We present an optimisation framework which computes the optimal control at each step, using an objective function which incorporates uncertainties in state estimation, feature coverage for better recognition confidence and control consumption. The proposed framework can handle various issues such as object initialisation, collision avoidance, occlusion and changing the object hypothesis. Validations based on a simulation environment are also presented
Modeling and Simulation in Engineering
This book provides an open platform to establish and share knowledge developed by scholars, scientists, and engineers from all over the world, about various applications of the modeling and simulation in the design process of products, in various engineering fields. The book consists of 12 chapters arranged in two sections (3D Modeling and Virtual Prototyping), reflecting the multidimensionality of applications related to modeling and simulation. Some of the most recent modeling and simulation techniques, as well as some of the most accurate and sophisticated software in treating complex systems, are applied. All the original contributions in this book are jointed by the basic principle of a successful modeling and simulation process: as complex as necessary, and as simple as possible. The idea is to manipulate the simplifying assumptions in a way that reduces the complexity of the model (in order to make a real-time simulation), but without altering the precision of the results
A Benchmark and Evaluation of Non-Rigid Structure from Motion
Non-Rigid structure from motion (NRSfM), is a long standing and central
problem in computer vision, allowing us to obtain 3D information from multiple
images when the scene is dynamic. A main issue regarding the further
development of this important computer vision topic, is the lack of high
quality data sets. We here address this issue by presenting of data set
compiled for this purpose, which is made publicly available, and considerably
larger than previous state of the art. To validate the applicability of this
data set, and provide and investigation into the state of the art of NRSfM,
including potential directions forward, we here present a benchmark and a
scrupulous evaluation using this data set. This benchmark evaluates 16
different methods with available code, which we argue reasonably spans the
state of the art in NRSfM. We also hope, that the presented and public data set
and evaluation, will provide benchmark tools for further development in this
field
- …