831 research outputs found

    Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment

    Get PDF
    abstract: Parents fulfill a pivotal role in early childhood development of social and communication skills. In children with autism, the development of these skills can be delayed. Applied behavioral analysis (ABA) techniques have been created to aid in skill acquisition. Among these, pivotal response treatment (PRT) has been empirically shown to foster improvements. Research into PRT implementation has also shown that parents can be trained to be effective interventionists for their children. The current difficulty in PRT training is how to disseminate training to parents who need it, and how to support and motivate practitioners after training. Evaluation of the parents’ fidelity to implementation is often undertaken using video probes that depict the dyadic interaction occurring between the parent and the child during PRT sessions. These videos are time consuming for clinicians to process, and often result in only minimal feedback for the parents. Current trends in technology could be utilized to alleviate the manual cost of extracting data from the videos, affording greater opportunities for providing clinician created feedback as well as automated assessments. The naturalistic context of the video probes along with the dependence on ubiquitous recording devices creates a difficult scenario for classification tasks. The domain of the PRT video probes can be expected to have high levels of both aleatory and epistemic uncertainty. Addressing these challenges requires examination of the multimodal data along with implementation and evaluation of classification algorithms. This is explored through the use of a new dataset of PRT videos. The relationship between the parent and the clinician is important. The clinician can provide support and help build self-efficacy in addition to providing knowledge and modeling of treatment procedures. Facilitating this relationship along with automated feedback not only provides the opportunity to present expert feedback to the parent, but also allows the clinician to aid in personalizing the classification models. By utilizing a human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the classification models by providing additional labeled samples. This will allow the system to improve classification and provides a person-centered approach to extracting multimodal data from PRT video probes.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    J Acquir Immune Defic Syndr

    Get PDF
    African Americans and Hispanics in the United States have much higher rates of HIV than non-minorities. There is now strong evidence that a range of behavioral interventions are efficacious in reducing sexual risk behavior in these populations. Although a handful of these programs are just beginning to be disseminated widely, we still have not implemented effective programs to a level that would reduce the population incidence of HIV for minorities. We proposed that innovative approaches involving computational technologies be explored for their use in both developing new interventions and in supporting wide-scale implementation of effective behavioral interventions. Mobile technologies have a place in both of these activities. First, mobile technologies can be used in sensing contexts and interacting to the unique preferences and needs of individuals at times where intervention to reduce risk would be most impactful. Second, mobile technologies can be used to improve the delivery of interventions by facilitators and their agencies. Systems science methods including social network analysis, agent-based models, computational linguistics, intelligent data analysis, and systems and software engineering all have strategic roles that can bring about advances in HIV prevention in minority communities. Using an existing mobile technology for depression and 3 effective HIV prevention programs, we illustrated how 8 areas in the intervention/implementation process can use innovative computational approaches to advance intervention adoption, fidelity, and sustainability.P20 MH090318/MH/NIMH NIH HHS/United StatesP20MH090318/MH/NIMH NIH HHS/United StatesP30 AI050409/AI/NIAID NIH HHS/United StatesP30 AI073961/AI/NIAID NIH HHS/United StatesP30 DA027828/DA/NIDA NIH HHS/United StatesP30 MH074678/MH/NIMH NIH HHS/United StatesP30AI050409/AI/NIAID NIH HHS/United StatesP30AI073961/AI/NIAID NIH HHS/United StatesP30DA027828/DA/NIDA NIH HHS/United StatesP30MH074678/MH/NIMH NIH HHS/United StatesR01 DA025192/DA/NIDA NIH HHS/United StatesR01 DA030452/DA/NIDA NIH HHS/United StatesR01 MH066302/MH/NIMH NIH HHS/United StatesR01DA025192/DA/NIDA NIH HHS/United StatesR01DA030452/DA/NIDA NIH HHS/United StatesR01MH066302/MH/NIMH NIH HHS/United StatesR13 HD074468/HD/NICHD NIH HHS/United StatesR13 MH-081733-01A1/MH/NIMH NIH HHS/United StatesR13 MH081733/MH/NIMH NIH HHS/United StatesU01-PS000671/PS/NCHHSTP CDC HHS/United StatesUL1 TR000150/TR/NCATS NIH HHS/United StatesUL1 TR000460/TR/NCATS NIH HHS/United StatesUL1TR000460/TR/NCATS NIH HHS/United States2014-06-01T00:00:00Z23673892PMC374676

    Sparks of Large Audio Models: A Survey and Outlook

    Full text link
    This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources--from human voices to musical instruments and environmental sounds--poses challenges distinct from those found in traditional Natural Language Processing scenarios. Nevertheless, \textit{Large Audio Models}, epitomized by transformer-based architectures, have shown marked efficacy in this sphere. By leveraging massive amount of data, these models have demonstrated prowess in a variety of audio tasks, spanning from Automatic Speech Recognition and Text-To-Speech to Music Generation, among others. Notably, recently these Foundational Audio Models, like SeamlessM4T, have started showing abilities to act as universal translators, supporting multiple speech tasks for up to 100 languages without any reliance on separate task-specific systems. This paper presents an in-depth analysis of state-of-the-art methodologies regarding \textit{Foundational Large Audio Models}, their performance benchmarks, and their applicability to real-world scenarios. We also highlight current limitations and provide insights into potential future research directions in the realm of \textit{Large Audio Models} with the intent to spark further discussion, thereby fostering innovation in the next generation of audio-processing systems. Furthermore, to cope with the rapid development in this area, we will consistently update the relevant repository with relevant recent articles and their open-source implementations at https://github.com/EmulationAI/awesome-large-audio-models.Comment: work in progress, Repo URL: https://github.com/EmulationAI/awesome-large-audio-model

    A computational future for preventing HIV in minority communities: How advanced technology can improve implementation of effective programs

    Get PDF
    Abstract African Americans and Hispanics in the U.S. have much higher rates of HIV than non-minorities. There is now strong evidence that a range of behavioral interventions are efficacious in reducing sexual risk behavior in these populations. While a handful of these programs are just beginning to be disseminated widely, we still have not implemented effective programs to a level that would reduce the population incidence of HIV for minorities. We propose that innovative approaches involving computational technologies be explored for their use in both developing new interventions as well as in supporting wide-scale implementation of effective behavioral interventions. Mobile technologies have a place in both of these activities. First, mobile technologies can be used in sensing contexts and interacting to the unique preferences and needs of individuals at times where intervention to reduce risk would be most impactful. Secondly, mobile technologies can be used to improve the delivery of interventions by facilitators and their agencies. Systems science methods, including social network analysis, agent based models, computational linguistics, intelligent data analysis, and systems and software engineering all have strategic roles that can bring about advances in HIV prevention in minority communities. Using an existing mobile technology for depression and three effective HIV prevention programs, we illustrate how eight areas in the intervention/implementation process can use innovative computational approaches to advance intervention adoption, fidelity, and sustainability

    Cognitive Decay And Memory Recall During Long Duration Spaceflight

    Get PDF
    This dissertation aims to advance the efficacy of Long-Duration Space Flight (LDSF) pre-flight and in-flight training programs, acknowledging existing knowledge gaps in NASA\u27s methodologies. The research\u27s objective is to optimize the cognitive workload of LDSF crew members, enhance their neurocognitive functionality, and provide more meaningful work experiences, particularly for Mars missions.The study addresses identified shortcomings in current training and learning strategies and simulation-based training systems, focusing on areas requiring quantitative measures for astronaut proficiency and training effectiveness assessment. The project centers on understanding cognitive decay and memory loss under LDSF-related stressors, seeking to establish when such cognitive decline exceeds acceptable performance levels throughout mission phases. The research acknowledges the limitations of creating a near-orbit environment due to resource constraints and the need to develop engaging tasks for test subjects. Nevertheless, it underscores the potential impact on future space mission training and other high-risk professions. The study further explores astronaut training complexities, the challenges encountered in LDSF missions, and the cognitive processes involved in such demanding environments. The research employs various cognitive and memory testing events, integrating neuroimaging techniques to understand cognition\u27s neural mechanisms and memory. It also explores Rasmussen\u27s S-R-K behaviors and Brain Network Theory’s (BNT) potential for measuring forgetting, cognition, and predicting training needs. The multidisciplinary approach of the study reinforces the importance of integrating insights from cognitive psychology, behavior analysis, and brain connectivity research. Research experiments were conducted at the University of North Dakota\u27s Integrated Lunar Mars Analog Habitat (ILMAH), gathering data from selected subjects via cognitive neuroscience tools and Electroencephalography (EEG) recordings to evaluate neurocognitive performance. The data analysis aimed to assess brain network activations during mentally demanding activities and compare EEG power spectra across various frequencies, latencies, and scalp locations. Despite facing certain challenges, including inadequacies of the current adapter boards leading to analysis failure, the study provides crucial lessons for future research endeavors. It highlights the need for swift adaptation, continual process refinement, and innovative solutions, like the redesign of adapter boards for high radio frequency noise environments, for the collection of high-quality EEG data. In conclusion, while the research did not reveal statistically significant differences between the experimental and control groups, it furnished valuable insights and underscored the need to optimize astronaut performance, well-being, and mission success. The study contributes to the ongoing evolution of training methodologies, with implications for future space exploration endeavors

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Towards Video Transformers for Automatic Human Analysis

    Full text link
    [eng] With the aim of creating artificial systems capable of mirroring the nuanced understanding and interpretative powers inherent to human cognition, this thesis embarks on an exploration of the intersection between human analysis and Video Transformers. The objective is to harness the potential of Transformers, a promising architectural paradigm, to comprehend the intricacies of human interaction, thus paving the way for the development of empathetic and context-aware intelligent systems. In order to do so, we explore the whole Computer Vision pipeline, from data gathering, to deeply analyzing recent developments, through model design and experimentation. Central to this study is the creation of UDIVA, an expansive multi-modal, multi-view dataset capturing dyadic face-to-face human interactions. Comprising 147 participants across 188 sessions, UDIVA integrates audio-visual recordings, heart-rate measurements, personality assessments, socio- demographic metadata, and conversational transcripts, establishing itself as the largest dataset for dyadic human interaction analysis up to this date. This dataset provides a rich context for probing the capabilities of Transformers within complex environments. In order to validate its utility, as well as to elucidate Transformers' ability to assimilate diverse contextual cues, we focus on addressing the challenge of personality regression within interaction scenarios. We first adapt an existing Video Transformer to handle multiple contextual sources and conduct rigorous experimentation. We empirically observe a progressive enhancement in model performance as more context is added, reinforcing the potential of Transformers to decode intricate human dynamics. Building upon these findings, the Dyadformer emerges as a novel architecture, adept at long-range modeling of dyadic interactions. By jointly modeling both participants in the interaction, as well as embedding multi- modal integration into the model itself, the Dyadformer surpasses the baseline and other concurrent approaches, underscoring Transformers' aptitude in deciphering multifaceted, noisy, and challenging tasks such as the analysis of human personality in interaction. Nonetheless, these experiments unveil the ubiquitous challenges when training Transformers, particularly in managing overfitting due to their demand for extensive datasets. Consequently, we conclude this thesis with a comprehensive investigation into Video Transformers, analyzing topics ranging from architectural designs and training strategies, to input embedding and tokenization, traversing through multi-modality and specific applications. Across these, we highlight trends which optimally harness spatio-temporal representations that handle video redundancy and high dimensionality. A culminating performance comparison is conducted in the realm of video action classification, spotlighting strategies that exhibit superior efficacy, even compared to traditional CNN-based methods.[cat] Aquesta tesi busca crear sistemes artificials que reflecteixin les habilitats de comprensió i interpretació humanes a través de l'ús de Transformers per a vídeo. L'objectiu és utilitzar aquestes arquitectures per comprendre millor la interacció humana i desenvolupar sistemes intel·ligents i conscients de l'entorn. Això implica explorar àmplies àrees de la Visió per Computador, des de la recopilació de dades fins a l'anàlisi de l'estat de l'art i la prova experimental d'aquests models. Una part essencial d'aquest estudi és la creació d'UDIVA, un ampli conjunt de dades multimodal i multivista que enregistra interaccions humanes cara a cara. Amb 147 participants i 188 sessions, UDIVA inclou contingut audiovisual, freqüència cardíaca, perfils de personalitat, dades sociodemogràfiques i transcripcions de les converses. És el conjunt de dades més gran conegut per a l'anàlisi de la interacció humana diàdica i proporciona un context ric per a l'estudi de les capacitats dels Transformers en entorns complexos. Per tal de validar la seva utilitat i les habilitats dels Transformers, ens centrem en la regressió de la personalitat. Inicialment, adaptem un Transformer de vídeo per integrar diverses fonts de context. Mitjançant experiments exhaustius, observem millores progressives en els resultats amb la inclusió de més context, confirmant la capacitat dels Transformers. Motivats per aquests resultats, desenvolupem el Dyadformer, una arquitectura per interaccions diàdiques de llarga duració. Aquesta nova arquitectura considera simultàniament els dos participants en la interacció i incorpora la multimodalitat en un sol model. El Dyadformer supera la nostra proposta inicial i altres treballs similars, destacant la capacitat dels Transformers per abordar tasques complexes. No obstant això, aquestos experiments revelen reptes d'entrenament dels Transformers, com el sobreajustament, per la seva necessitat de grans conjunts de dades. La tesi conclou amb una anàlisi profunda dels Transformers per a vídeo, incloent dissenys arquitectònics, estratègies d'entrenament, preprocessament de vídeos, tokenització i multimodalitat. S'identifiquen tendències per gestionar la redundància i alta dimensionalitat de vídeos i es realitza una comparació de rendiment en la classificació d'accions a vídeo, destacant estratègies d'eficàcia superior als mètodes tradicionals basats en convolucions

    Towards Interaction-level Video Action Understanding

    Get PDF
    A huge amount of videos have been created, spread, and viewed daily. Among these massive videos, the actions and activities of humans account for a large part. We desire machines to understand human actions in videos as this is essential to various applications, including but not limited to autonomous driving cars, security systems, human-robot interactions and healthcare. Towards real intelligent system that is able to interact with humans, video understanding must go beyond simply answering ``what is the action in the video", but be more aware of what those actions mean to humans and be more in line with human thinking, which we call interactive-level action understanding. This thesis identifies three main challenges to approaching interactive-level video action understanding: 1) understanding actions given human consensus; 2) understanding actions based on specific human rules; 3) directly understanding actions in videos via human natural language. For the first challenge, we select video summary as a representative task that aims to select informative frames to retain high-level information based on human annotators' experience. Through self-attention architecture and meta-learning, which jointly process dual representations of visual and sequential information for video summarization, the proposed model is capable of understanding video from human consensus (e.g., how humans think which parts of an action sequence are essential). For the second challenge, our works on action quality assessment utilize transformer decoders to parse the input action into several sub-actions and assess the more fine-grained qualities of the given action, yielding the capability of action understanding given specific human rules. (e.g., how well a diving action performs, how well a robot performs surgery) The third key idea explored in this thesis is to use graph neural networks in an adversarial fashion to understand actions through natural language. We demonstrate the utility of this technique for the video captioning task, which takes an action video as input, outputs natural language, and yields state-of-the-art performance. It can be concluded that the research directions and methods introduced in this thesis provide fundamental components toward interactive-level action understanding

    Expressive language development in minimally verbal autistic children: exploring the role of speech production

    Get PDF
    Trajectories of expressive language development are highly heterogeneous in autism. I examine the hypothesis that co-morbid speech production difficulties may be a contributing factor for some minimally verbal autistic individuals. Chapters 1 and 2 provide an overview of language variation within autism, and existing intervention approaches for minimally verbal autistic children. These chapters situate this thesis within the existing literature. Chapter 3 describes a longitudinal study of expressive language in minimally verbal 3-5 year olds (n=27), with four assessment points over 12 months. Contrary to expectations, initial communicative intent, parent responsiveness and response to joint attention did not predict expressive language growth or outcome. Speech skills were significant predictors. Chapter 4 describes the design, development and feasibility testing of the BabbleBooster app, a novel, parent-meditated speech skills intervention, in which 19 families participated for 16 weeks. Acceptability feedback was positive but adherence was variable. I discuss how this could be improved in future iterations of the app and intervention protocol. Chapter 5 details how BabbleBooster’s efficacy was evaluated. For interventions with complex or rare populations, a randomized case series design is a useful alternative to an under-powered group trial. There was no evidence that BabbleBooster improved speech production scores, likely due to limited dosage. Future research using this study design could determine optimal treatment intensity and duration with an improved version of the app. Taken together, these studies underscore the contribution of speech production abilities to expressive language development in minimally verbal autistic individuals. I argue that this reflects an additional condition, and is not a consequence of core autism features. The intervention piloted here represents a first step towards developing a scalable tool for parents to support speech development in minimally verbal children, and illustrates the utility of randomized single case series for testing treatment effects in small, heterogeneous cohorts
    • …
    corecore