Search CORE

612 research outputs found

Computational Multimedia for Video Self Modeling

Author: Shen Ju
Publication venue: UKnowledge
Publication date: 01/01/2014
Field of study

Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras

CiteSeerX

University of Kentucky

Automatic Video Self Modeling for Voice Disorder

Author: Cheung Sen-ching S.
Patel Rita
Raghunathan Anusha
Shen Ju
Ti Changpeng
Publication venue: eCommons
Publication date: 01/07/2015
Field of study

Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient’s original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Results of both objective measurements and subjective evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques

University of Dayton

Automatic Content Generation for Video Self Modeling

Author: Cheung Sen-ching S.
Patel Ravi R.
Raghunathan Anusha
Shen Ju
Publication venue: eCommons
Publication date: 01/07/2011
Field of study

Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him or herself. Its effectiveness in rehabilitation and education has been repeatedly demonstrated but technical challenges remain in creating video contents that depict previously unseen behaviors. In this paper, we propose a novel system that re-renders new talking-head sequences suitable to be used for VSM treatment of patients with voice disorder. After the raw footage is captured, a new speech track is either synthesized using text-to-speech or selected based on voice similarity from a database of clean speeches. Voice conversion is then applied to match the new speech to the original voice. Time markers extracted from the original and new speech track are used to re-sample the video track for lip synchronization. We use an adaptive re-sampling strategy to minimize motion jitter, and apply bilinear and optical-flow based interpolation to ensure the image quality. Both objective measurements and subjective evaluations demonstrate the effectiveness of the proposed techniques

University of Dayton

Structure Preserving Large Imagery Reconstruction

Author: Hitz Markus
Payne Bryson
Shen Ju
Taha-abusneineh Sami
Yang Jianjun
Publication venue
Publication date: 01/07/2014
Field of study

With the explosive growth of web-based cameras and mobile devices, billions of photographs are uploaded to the internet. We can trivially collect a huge number of photo streams for various goals, such as image clustering, 3D scene reconstruction, and other big data applications. However, such tasks are not easy due to the fact the retrieved photos can have large variations in their view perspectives, resolutions, lighting, noises, and distortions. Fur-thermore, with the occlusion of unexpected objects like people, vehicles, it is even more challenging to find feature correspondences and reconstruct re-alistic scenes. In this paper, we propose a structure-based image completion algorithm for object removal that produces visually plausible content with consistent structure and scene texture. We use an edge matching technique to infer the potential structure of the unknown region. Driven by the estimated structure, texture synthesis is performed automatically along the estimated curves. We evaluate the proposed method on different types of images: from highly structured indoor environment to natural scenes. Our experimental results demonstrate satisfactory performance that can be potentially used for subsequent big data processing, such as image localization, object retrieval, and scene reconstruction. Our experiments show that this approach achieves favorable results that outperform existing state-of-the-art techniques

arXiv.org e-Print Archive

University of Dayton

Tracking Visible Features of Speech for Computer-Based Speech Therapy for Childhood Apraxia of Speech

Author: Zhian Zhian
Publication venue
Publication date: 01/03/2018
Field of study

At present, there are few, if any, effective computer-based speech therapy systems (CBSTs) that support the at-home component for clinical interventions for Childhood Apraxia of Speech (CAS). PROMPT, an established speech therapy intervention for CAS, has the potential to be supported via a CBST, which could increase engagement and provide valuable feedback to the child. However, the necessary computational techniques have not yet been developed and evaluated. In this thesis, I will describe the development of some of the key underlying computational components that are required for the development of such a system. These components concern camera-based tracking of visible features of speech which concern jaw kinematics. These components would also be necessary for the serious game that we have envisioned

YorkSpace

Affective Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing

Directory of Open Access Books (DOAB)

Computerized Identification of Mental State for Mental Health Care

Author: Cortez Angel
Ford Veronica
Watkins Frederick
Publication venue: ResearchBerg Review of Science and Technology
Publication date: 19/10/2022
Field of study

Introduction Sebastian, a 30-year-old man, was just diagnosed with depression and is undergoing cognitive-behavioral therapy (CBT) sessions on a weekly basis. As he goes about his daily routine, he wears a gadget that measures his physiological activity. His therapist enters his physiological data for the week into a computer software at the start of each CBT session. The computer compiles Sebastian's levels of happy and negative affect and pinpoints the times when these reactions peaked. The therapist utilizes this information to track Sebastian's development and dynamically customize the therapy, for as by asking Sebastian to recollect experiences that correlate to some of the emotion peaks

ResearchBerg

by

Author: Mohammed Ehsan Hoque
Mohammed Ehsan Hoque
Pattie Maes
Rosalind W. Picard
Publication venue
Publication date
Field of study

CiteSeerX

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

Directory of Open Access Books (DOAB)

Recommended from our members

The Role of Rhythm in Speech and Language Rehabilitation: The SEP Hypothesis

Author: Fujii Shinya
Wan Catherine Y.
Publication venue: 'Frontiers Media SA'
Publication date: 01/10/2014
Field of study

For thousands of years, human beings have engaged in rhythmic activities such as drumming, dancing, and singing. Rhythm can be a powerful medium to stimulate communication and social interactions, due to the strong sensorimotor coupling. For example, the mere presence of an underlying beat or pulse can result in spontaneous motor responses such as hand clapping, foot stepping, and rhythmic vocalizations. Examining the relationship between rhythm and speech is fundamental not only to our understanding of the origins of human communication but also in the treatment of neurological disorders. In this paper, we explore whether rhythm has therapeutic potential for promoting recovery from speech and language dysfunctions. Although clinical studies are limited to date, existing experimental evidence demonstrates rich rhythmic organization in both music and language, as well as overlapping brain networks that are crucial in the design of rehabilitation approaches. Here, we propose the “SEP” hypothesis, which postulates that (1) “sound envelope processing” and (2) “synchronization and entrainment to pulse” may help stimulate brain networks that underlie human communication. Ultimately, we hope that the SEP hypothesis will provide a useful framework for facilitating rhythm-based research in various patient populations

Harvard University - DASH

Directory of Open Access Journals