12 research outputs found
Domain Generalization by Rejecting Extreme Augmentations
Data augmentation is one of the most effective techniques for regularizing
deep learning models and improving their recognition performance in a variety
of tasks and domains. However, this holds for standard in-domain settings, in
which the training and test data follow the same distribution. For the
out-of-domain case, where the test data follow a different and unknown
distribution, the best recipe for data augmentation is unclear. In this paper,
we show that for out-of-domain and domain generalization settings, data
augmentation can provide a conspicuous and robust improvement in performance.
To do that, we propose a simple training procedure: (i) use uniform sampling on
standard data augmentation transformations; (ii) increase the strength
transformations to account for the higher data variance expected when working
out-of-domain, and (iii) devise a new reward function to reject extreme
transformations that can harm the training. With this procedure, our data
augmentation scheme achieves a level of accuracy that is comparable to or
better than state-of-the-art methods on benchmark domain generalization
datasets. Code: \url{https://github.com/Masseeh/DCAug
Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph Transformer
Movement synchrony reflects the coordination of body movements between
interacting dyads. The estimation of movement synchrony has been automated by
powerful deep learning models such as transformer networks. However, instead of
designing a specialized network for movement synchrony estimation, previous
transformer-based works broadly adopted architectures from other tasks such as
human activity recognition. Therefore, this paper proposed a skeleton-based
graph transformer for movement synchrony estimation. The proposed model applied
ST-GCN, a spatial-temporal graph convolutional neural network for skeleton
feature extraction, followed by a spatial transformer for spatial feature
generation. The spatial transformer is guided by a uniquely designed joint
position embedding shared between the same joints of interacting individuals.
Besides, we incorporated a temporal similarity matrix in temporal attention
computation considering the periodic intrinsic of body movements. In addition,
the confidence score associated with each joint reflects the uncertainty of a
pose, while previous works on movement synchrony estimation have not
sufficiently emphasized this point. Since transformer networks demand a
significant amount of data to train, we constructed a dataset for movement
synchrony estimation using Human3.6M, a benchmark dataset for human activity
recognition, and pretrained our model on it using contrastive learning. We
further applied knowledge distillation to alleviate information loss introduced
by pose detector failure in a privacy-preserving way. We compared our method
with representative approaches on PT13, a dataset collected from autism therapy
interventions. Our method achieved an overall accuracy of 88.98% and surpassed
its counterparts by a wide margin while maintaining data privacy.Comment: Accepted by 24th ACM International Conference on Multimodal
Interaction (ICMI'22). 17 pages, 2 figure
TOWARDS ROBUST REPRESENTATION LEARNING AND BEYOND
Deep networks have reshaped the computer vision research in recent years. As fueled by powerful computational resources and massive amount of data, deep networks now dominate a wide range of visual benchmarks. Nonetheless, these success stories come with bitterness---an increasing amount of studies has shown the limitations of deep networks on certain testing conditions like small input changes or occlusion. These failures not only raise safety and reliability concerns on the applicability of deep networks in the real world, but also demonstrate the computations performed by the current deep networks are dramatically different from those by human brains.
In this dissertation, we focus on investigating and tackling a particular yet challenging weakness of deep networks---their vulnerability to adversarial examples. The first part of this thesis argues that such vulnerability is a much more severe issue than we thought---the threats from adversarial examples are ubiquitous and catastrophic. We then discuss how to equip deep networks with robust representations for defending against adversarial examples. We approach the solution from the perspective of neural architecture design, and show incorporating architectural elements like feature-level denoisers or smooth activation functions can effectively boost model robustness. The last part of this thesis focuses on rethinking the value of adversarial examples. Rather than treating adversarial examples as a threat to deep networks, we take a further step on uncovering adversarial examples can help deep networks improve the generalization ability, if feature representations are properly disentangled during learning
Semi-Supervised Learning with Unlabeled data: from Centralized to Distributed Systems
The rapid increase in data generated by edge devices and IoT technologies demands efficient
management solutions, especially in terms of cost and infrastructure. Key challenges include
expensive data labeling, requiring significant human resources, and ensuring data privacy and
security, with the risk of information leakage during transmission. The limited availability of labeled
data versus the exponential growth of new data presents challenges for maintaining accuracy and
efficiency in data-driven models. High costs of data annotation, especially involving subject-matter
experts, limit model training effectiveness. Privacy concerns are heightened due to edge devices'
interaction with sensitive user data, making the affordability of data labeling and data privacy
protection at the edge crucial issues.
To tackle these challenges, we proposed Data Augmentation Random Padding to increase effective
data for model training in CNNs, enhancing image classification accuracy. Despite limitations, we
developed Semi-Supervised Learning (SSL) method "AdaptMatch" to utilize a large amount of
unlabeled data and a small number of labeled data in centralized learning, improving learning speed
and reducing label bias. To address the inadequacies of centralized data learning, we introduced a
decentralized SSL method, Federated Incremental Learning (FedIL), for learning on edge devices
while protecting privacy. However, FedIL's lower training efficiency and struggles with data imbalance
led to the development of Federated Masked Autoencoder (FedMAE), which is also a decentralized
semi-supervised learning method based on self-supervised learning. FedMAE enables asynchronous
training of large-scale unlabeled images in federated learning, outperforming existing methods in
handling highly imbalanced data
Deep Neural Networks and Data for Automated Driving
This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above
Note Taking in the Digital Age – Towards a Ubiquitous Pen Interface
The cultural technique of writing helped humans to express, communicate, think, and memorize throughout history. With the advent of human-computer-interfaces, pens as command input for digital systems became popular. While current applications allow carrying out complex tasks with digital pens, they lack the ubiquity and directness of pen and paper. This dissertation models the note taking process in the context of scholarly work, motivated by an understanding of note taking that surpasses mere storage of knowledge. The results, together with qualitative empirical findings about contemporary scholarly workflows that alternate between the analog and the digital world, inspire a novel pen interface concept. This concept proposes the use of an ordinary pen and unmodified writing surfaces for interacting with digital
systems. A technological investigation into how a camera-based system can connect physical ink strokes with digital handwriting processing delivers artificial neural network-based building blocks towards that goal. Using these components, the technological feasibility of in-air pen gestures for command input is explored. A proof-of-concept implementation of a prototype system reaches real-time performance and demonstrates distributed computing strategies for realizing the interface concept
in an end-user setting