20 research outputs found
Ensemble Modeling for Multimodal Visual Action Recognition
In this work, we propose an ensemble modeling approach for multimodal action
recognition. We independently train individual modality models using a variant
of focal loss tailored to handle the long-tailed distribution of the MECCANO
[21] dataset. Based on the underlying principle of focal loss, which captures
the relationship between tail (scarce) classes and their prediction
difficulties, we propose an exponentially decaying variant of focal loss for
our current task. It initially emphasizes learning from the hard misclassified
examples and gradually adapts to the entire range of examples in the dataset.
This annealing process encourages the model to strike a balance between
focusing on the sparse set of hard samples, while still leveraging the
information provided by the easier ones. Additionally, we opt for the late
fusion strategy to combine the resultant probability distributions from RGB and
Depth modalities for final action prediction. Experimental evaluations on the
MECCANO dataset demonstrate the effectiveness of our approach.Comment: 22nd International Conference on Image Analysis and Processing
Workshops - Multimodal Action Recognition on the MECCANO Dataset, 202
Establishment of Neural Networks Robust to Label Noise
Label noise is a significant obstacle in deep learning model training. It can
have a considerable impact on the performance of image classification models,
particularly deep neural networks, which are especially susceptible because
they have a strong propensity to memorise noisy labels. In this paper, we have
examined the fundamental concept underlying related label noise approaches. A
transition matrix estimator has been created, and its effectiveness against the
actual transition matrix has been demonstrated. In addition, we examined the
label noise robustness of two convolutional neural network classifiers with
LeNet and AlexNet designs. The two FashionMINIST datasets have revealed the
robustness of both models. We are not efficiently able to demonstrate the
influence of the transition matrix noise correction on robustness enhancements
due to our inability to correctly tune the complex convolutional neural network
model due to time and computing resource constraints. There is a need for
additional effort to fine-tune the neural network model and explore the
precision of the estimated transition model in future research.Comment: 11 pages, 7 figure
Recognizing Characters in Art History Using Deep Learning
In the field of Art History, images of artworks and their contexts are core
to understanding the underlying semantic information. However, the highly
complex and sophisticated representation of these artworks makes it difficult,
even for the experts, to analyze the scene. From the computer vision
perspective, the task of analyzing such artworks can be divided into
sub-problems by taking a bottom-up approach. In this paper, we focus on the
problem of recognizing the characters in Art History. From the iconography of
(Figure 1), we consider the representation of
the main protagonists, and , across different artworks and
styles. We investigate and present the findings of training a character
classifier on features extracted from their face images. The limitations of
this method, and the inherent ambiguity in the representation of ,
motivated us to consider their bodies (a bigger context) to analyze in order to
recognize the characters. Convolutional Neural Networks (CNN) trained on the
bodies of and are able to learn person related features and
ultimately improve the performance of character recognition. We introduce a new
technique that generates more data with similar styles, effectively creating
data in the similar domain. We present experiments and analysis on three
different models and show that the model trained on domain related data gives
the best performance for recognizing character. Additionally, we analyze the
localized image regions for the network predictions. Code is open-sourced and
available at
https://github.com/prathmeshrmadhu/recognize_characters_art_history and the
link to the published peer-reviewed article is
https://dl.acm.org/citation.cfm?id=3357242
Can Continual Learning Improve Long-Tailed Recognition? Toward a Unified Framework
The Long-Tailed Recognition (LTR) problem emerges in the context of learning
from highly imbalanced datasets, in which the number of samples among different
classes is heavily skewed. LTR methods aim to accurately learn a dataset
comprising both a larger Head set and a smaller Tail set. We propose a theorem
where under the assumption of strong convexity of the loss function, the
weights of a learner trained on the full dataset are within an upper bound of
the weights of the same learner trained strictly on the Head. Next, we assert
that by treating the learning of the Head and Tail as two separate and
sequential steps, Continual Learning (CL) methods can effectively update the
weights of the learner to learn the Tail without forgetting the Head. First, we
validate our theoretical findings with various experiments on the toy MNIST-LT
dataset. We then evaluate the efficacy of several CL strategies on multiple
imbalanced variations of two standard LTR benchmarks (CIFAR100-LT and
CIFAR10-LT), and show that standard CL methods achieve strong performance gains
in comparison to baselines and approach solutions that have been tailor-made
for LTR. We also assess the applicability of CL techniques on real-world data
by exploring CL on the naturally imbalanced Caltech256 dataset and demonstrate
its superiority over state-of-the-art classifiers. Our work not only unifies
LTR and CL but also paves the way for leveraging advances in CL methods to
tackle the LTR challenge more effectively