218 research outputs found
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification
In order to perform multimodal fusion of heterogeneous signals, we need to
understand their interactions: how each modality individually provides
information useful for a task and how this information changes in the presence
of other modalities. In this paper, we perform a comparative study of how
humans annotate two categorizations of multimodal interactions: (1) partial
labels, where different annotators annotate the label given the first, second,
and both modalities, and (2) counterfactual labels, where the same annotator
annotates the label given the first modality before asking them to explicitly
reason about how their answer changes when given the second. We further propose
an alternative taxonomy based on (3) information decomposition, where
annotators annotate the degrees of redundancy: the extent to which modalities
individually and together give the same predictions, uniqueness: the extent to
which one modality enables a prediction that the other does not, and synergy:
the extent to which both modalities enable one to make a prediction that one
would not otherwise make using individual modalities. Through experiments and
annotations, we highlight several opportunities and limitations of each
approach and propose a method to automatically convert annotations of partial
and counterfactual labels to information decomposition, yielding an accurate
and efficient method for quantifying multimodal interactions.Comment: International Conference on Multimodal Interaction (ICMI '23), Code
available at: https://github.com/pliang279/PID. arXiv admin note: text
overlap with arXiv:2302.1224
Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment
abstract: Parents fulfill a pivotal role in early childhood development of social and communication
skills. In children with autism, the development of these skills can be delayed. Applied
behavioral analysis (ABA) techniques have been created to aid in skill acquisition.
Among these, pivotal response treatment (PRT) has been empirically shown to foster
improvements. Research into PRT implementation has also shown that parents can be
trained to be effective interventionists for their children. The current difficulty in PRT
training is how to disseminate training to parents who need it, and how to support and
motivate practitioners after training.
Evaluation of the parents’ fidelity to implementation is often undertaken using video
probes that depict the dyadic interaction occurring between the parent and the child during
PRT sessions. These videos are time consuming for clinicians to process, and often result
in only minimal feedback for the parents. Current trends in technology could be utilized to
alleviate the manual cost of extracting data from the videos, affording greater
opportunities for providing clinician created feedback as well as automated assessments.
The naturalistic context of the video probes along with the dependence on ubiquitous
recording devices creates a difficult scenario for classification tasks. The domain of the
PRT video probes can be expected to have high levels of both aleatory and epistemic
uncertainty. Addressing these challenges requires examination of the multimodal data
along with implementation and evaluation of classification algorithms. This is explored
through the use of a new dataset of PRT videos.
The relationship between the parent and the clinician is important. The clinician can
provide support and help build self-efficacy in addition to providing knowledge and
modeling of treatment procedures. Facilitating this relationship along with automated
feedback not only provides the opportunity to present expert feedback to the parent, but
also allows the clinician to aid in personalizing the classification models. By utilizing a
human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the
classification models by providing additional labeled samples. This will allow the system
to improve classification and provides a person-centered approach to extracting
multimodal data from PRT video probes.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Proyecto Docente e Investigador
PROYECTO DOCENTE E INVESTIGADOR
Catedráticos de Universidad
Área de Ciencia de la Computación e Inteligencia Artificial
Universidad de Valladolid
19 de Mayo de 2023
David Escudero Manceb
Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit
The primary focus of this thesis is to make Sanskrit manuscripts more
accessible to the end-users through natural language technologies. The
morphological richness, compounding, free word orderliness, and low-resource
nature of Sanskrit pose significant challenges for developing deep learning
solutions. We identify four fundamental tasks, which are crucial for developing
a robust NLP technology for Sanskrit: word segmentation, dependency parsing,
compound type identification, and poetry analysis. The first task, Sanskrit
Word Segmentation (SWS), is a fundamental text processing task for any other
downstream applications. However, it is challenging due to the sandhi
phenomenon that modifies characters at word boundaries. Similarly, the existing
dependency parsing approaches struggle with morphologically rich and
low-resource languages like Sanskrit. Compound type identification is also
challenging for Sanskrit due to the context-sensitive semantic relation between
components. All these challenges result in sub-optimal performance in NLP
applications like question answering and machine translation. Finally, Sanskrit
poetry has not been extensively studied in computational linguistics.
While addressing these challenges, this thesis makes various contributions:
(1) The thesis proposes linguistically-informed neural architectures for these
tasks. (2) We showcase the interpretability and multilingual extension of the
proposed systems. (3) Our proposed systems report state-of-the-art performance.
(4) Finally, we present a neural toolkit named SanskritShala, a web-based
application that provides real-time analysis of input for various NLP tasks.
Overall, this thesis contributes to making Sanskrit manuscripts more accessible
by developing robust NLP technology and releasing various resources, datasets,
and web-based toolkit.Comment: Ph.D. dissertatio
Models and Analysis of Vocal Emissions for Biomedical Applications
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
Individual Differences in Speech Production and Perception
Inter-individual variation in speech is a topic of increasing interest both in human sciences and speech technology. It can yield important insights into biological, cognitive, communicative, and social aspects of language. Written by specialists in psycholinguistics, phonetics, speech development, speech perception and speech technology, this volume presents experimental and modeling studies that provide the reader with a deep understanding of interspeaker variability and its role in speech processing, speech development, and interspeaker interactions. It discusses how theoretical models take into account individual behavior, explains why interspeaker variability enriches speech communication, and summarizes the limitations of the use of speaker information in forensics
- …