Search CORE

20 research outputs found

Ensemble Modeling for Multimodal Visual Action Recognition

Author: Dave Ishan
Fleischer Sarah
Kini Jyoti
Shah Mubarak
Publication venue
Publication date: 25/09/2023
Field of study

In this work, we propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset. Based on the underlying principle of focal loss, which captures the relationship between tail (scarce) classes and their prediction difficulties, we propose an exponentially decaying variant of focal loss for our current task. It initially emphasizes learning from the hard misclassified examples and gradually adapts to the entire range of examples in the dataset. This annealing process encourages the model to strike a balance between focusing on the sparse set of hard samples, while still leveraging the information provided by the easier ones. Additionally, we opt for the late fusion strategy to combine the resultant probability distributions from RGB and Depth modalities for final action prediction. Experimental evaluations on the MECCANO dataset demonstrate the effectiveness of our approach.Comment: 22nd International Conference on Image Analysis and Processing Workshops - Multimodal Action Recognition on the MECCANO Dataset, 202

arXiv.org e-Print Archive

Establishment of Neural Networks Robust to Label Noise

Author: Mangos Jack George
Teng Chongyangzi
Yang Pengwei
Publication venue
Publication date: 23/04/2023
Field of study

Label noise is a significant obstacle in deep learning model training. It can have a considerable impact on the performance of image classification models, particularly deep neural networks, which are especially susceptible because they have a strong propensity to memorise noisy labels. In this paper, we have examined the fundamental concept underlying related label noise approaches. A transition matrix estimator has been created, and its effectiveness against the actual transition matrix has been demonstrated. In addition, we examined the label noise robustness of two convolutional neural network classifiers with LeNet and AlexNet designs. The two FashionMINIST datasets have revealed the robustness of both models. We are not efficiently able to demonstrate the influence of the transition matrix noise correction on robustness enhancements due to our inability to correctly tune the complex convolutional neural network model due to time and computing resource constraints. There is a need for additional effort to fine-tune the neural network model and explore the precision of the estimated transition model in future research.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Recognizing Characters in Art History Using Deep Learning

Author: Bell Peter
Christlein Vincent
Kosti Ronak
Madhu Prathmesh
Maier Andreas
Mührenberg Lara
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

In the field of Art History, images of artworks and their contexts are core to understanding the underlying semantic information. However, the highly complex and sophisticated representation of these artworks makes it difficult, even for the experts, to analyze the scene. From the computer vision perspective, the task of analyzing such artworks can be divided into sub-problems by taking a bottom-up approach. In this paper, we focus on the problem of recognizing the characters in Art History. From the iconography of

Annunciation

of

the

Lord

(Figure 1), we consider the representation of the main protagonists,

Mary

and

Gabriel

, across different artworks and styles. We investigate and present the findings of training a character classifier on features extracted from their face images. The limitations of this method, and the inherent ambiguity in the representation of

Gabriel

, motivated us to consider their bodies (a bigger context) to analyze in order to recognize the characters. Convolutional Neural Networks (CNN) trained on the bodies of

Mary

and

Gabriel

are able to learn person related features and ultimately improve the performance of character recognition. We introduce a new technique that generates more data with similar styles, effectively creating data in the similar domain. We present experiments and analysis on three different models and show that the model trained on domain related data gives the best performance for recognizing character. Additionally, we analyze the localized image regions for the network predictions. Code is open-sourced and available at https://github.com/prathmeshrmadhu/recognize_characters_art_history and the link to the published peer-reviewed article is https://dl.acm.org/citation.cfm?id=3357242

arXiv.org e-Print Archive

Crossref

Can Continual Learning Improve Long-Tailed Recognition? Toward a Unified Framework

Author: Etemad Ali
Greenspan Michael
Molahasani Mahdiyar
Publication venue
Publication date: 22/06/2023
Field of study

The Long-Tailed Recognition (LTR) problem emerges in the context of learning from highly imbalanced datasets, in which the number of samples among different classes is heavily skewed. LTR methods aim to accurately learn a dataset comprising both a larger Head set and a smaller Tail set. We propose a theorem where under the assumption of strong convexity of the loss function, the weights of a learner trained on the full dataset are within an upper bound of the weights of the same learner trained strictly on the Head. Next, we assert that by treating the learning of the Head and Tail as two separate and sequential steps, Continual Learning (CL) methods can effectively update the weights of the learner to learn the Tail without forgetting the Head. First, we validate our theoretical findings with various experiments on the toy MNIST-LT dataset. We then evaluate the efficacy of several CL strategies on multiple imbalanced variations of two standard LTR benchmarks (CIFAR100-LT and CIFAR10-LT), and show that standard CL methods achieve strong performance gains in comparison to baselines and approach solutions that have been tailor-made for LTR. We also assess the applicability of CL techniques on real-world data by exploring CL on the naturally imbalanced Caltech256 dataset and demonstrate its superiority over state-of-the-art classifiers. Our work not only unifies LTR and CL but also paves the way for leveraging advances in CL methods to tackle the LTR challenge more effectively

arXiv.org e-Print Archive