Search CORE

343 research outputs found

Pairwise Ranking Network for Affect Recognition

Author: 9th International Conference on Affective Computing and Intelligent Interaction (ACII 2021)
Patras I
Zoumpourlis G
Publication venue
Publication date: 01/09/2021
Field of study

In this work we study the problem of emotion recognition under the prism of preference learning. Affective datasets are typically annotated by assigning a single absolute label, i.e. a numerical value that describes the intensity of an emotional attribute, to each sample. Then, the majority of existing works on affect recognition employ sample-wise classification/regression methods to predict affective states, using those annotations. We take a different approach and use a deep network architecture that performs joint training on the tasks of classification/regression of samples and ordinal ranking between pairs of samples. By treating input samples in a pairwise manner, we leverage the auxiliary task of inferring the ordinal relation between their corresponding affective states. Incorporating the ranking objective allows capturing the inherently ordinal structure of emotions and learning the inter-sample relations, resulting in better generalization. Our method is incorporated into existing affect recognition architectures and evaluated on datasets of electroencephalograms (EEG) and images. We show that the approach proposed in this work leads to consistent performance gains when incorporated in classification/regression networks

Queen Mary Research Online

Multimodal sentiment analysis in real-life videos

Author: Stappen Lukas
Publication venue
Publication date: 24/11/2022
Field of study

This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target. The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far. This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level. The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated. A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above. The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos

OPUS Augsburg

On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

Author: Antonisse Joey
Azzopardi George
Bennabhaktula Swaroop
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/10/2021
Field of study

Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Gesture Recognition and Control for Semi-Autonomous Robotic Assistant Surgeons

Author: DE ROSSI GIACOMO
Publication venue
Publication date: 01/01/2020
Field of study

The next stage for robotics development is to introduce autonomy and cooperation with human agents in tasks that require high levels of precision and/or that exert considerable physical strain. To guarantee the highest possible safety standards, the best approach is to devise a deterministic automaton that performs identically for each operation. Clearly, such approach inevitably fails to adapt itself to changing environments or different human companions. In a surgical scenario, the highest variability happens for the timing of different actions performed within the same phases. This thesis explores the solutions adopted in pursuing automation in robotic minimally-invasive surgeries (R-MIS) and presents a novel cognitive control architecture that uses a multi-modal neural network trained on a cooperative task performed by human surgeons and produces an action segmentation that provides the required timing for actions while maintaining full phase execution control via a deterministic Supervisory Controller and full execution safety by a velocity-constrained Model-Predictive Controller

Catalogo dei prodotti della ricerca

Identifying Depressive Symptoms from Tweets: Figurative Language Enabled Multitask Learning Framework

Author: Chauhan Jainish
Sain Joy Prakash
Schumm Jeremiah
Sheth Amit P.
Thirunarayan Krishnaprasad
Yadav Shweta
Publication venue: Scholar Commons
Publication date: 01/01/2020
Field of study

Existing studies on using social media for deriving mental health status of users focus on the depression detection task. However, for case management and referral to psychiatrists, healthcare workers require practical and scalable depressive disorder screening and triage system. This study aims to design and evaluate a decision support system (DSS) to reliably determine the depressive triage level by capturing fine-grained depressive symptoms expressed in user tweets through the emulation of Patient Health Questionnaire-9 (PHQ-9) that is routinely used in clinical practice. The reliable detection of depressive symptoms from tweets is challenging because the 280-character limit on tweets incentivizes the use of creative artifacts in the utterances and figurative usage contributes to effective expression. We propose a novel BERT based robust multi-task learning framework to accurately identify the depressive symptoms using the auxiliary task of figurative usage detection. Specifically, our proposed novel task sharing mechanism, co-task aware attention, enables automatic selection of optimal information across the BERT layers and tasks by soft-sharing of parameters. Our results show that modeling figurative usage can demonstrably improve the model\u27s robustness and reliability for distinguishing the depression symptoms

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

CORE

The role of time in video understanding

Author: Price Will
Publication venue
Publication date: 25/01/2022
Field of study

Explore Bristol Research

Video-efficient foundation models

Author: Thoker F.M.
Publication venue
Publication date: 01/01/2023
Field of study

The thesis strives to endow video-efficiency in video understanding by addressing the research question ''What enables video-efficient video foundation models?'' Video-efficiency encompasses developing video foundation models that are not only accurate but also exhibit label-efficiency i.e. require fewer labels, domain-efficiency i.e. applicable to a variety of video learning scenarios, and data-efficiency i.e. reduce the amount of video data needed for learning. The research question is addressed for RGB and non-RGB video modalities. In Chapter 2, we focus on improving the label- and domain-efficiency of non-RGB action recognition and detection. Chapter 3 introduces a new self-supervised approach for learning feature representations for 3D-skeleton video sequences. In Chapter 4, we conduct a large-scale study of existing RGB-based self-supervised video models to assess their performance across different facets of video-efficiency. Chapter 5 presents a new method for video self-supervision that explicitly aims to learn motion focused video-representations. To summarize, this thesis presents several novel approaches to improve the video-efficiency of video foundation models. Our research highlights the importance of transferring knowledge between RGB and non-RGB video modalities, exploring self-supervision for non-RGB video modeling, analyzing self-supervised models beyond canonical setups and carefully designing new self-supervised tasks to develop video foundation models that can exhibit different facets of video-efficiency. We hope that our work will inspire further research and development in this area, leading to even more video-efficient foundation models

International Migration, Integration and Social Cohesion online publications

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022)

Author
Publication venue: Tampere University
Publication date: 10/11/2022
Field of study

Trepo - Institutional Repository of Tampere University