2,009 research outputs found

    Automatically learning structural units in educational videos with the hierarchical hidden Markov models

    Full text link
    In this paper we present a coherent approach using the hierarchical HMM with shared structures to extract the structural units that form the building blocks of an education/training video. Rather than using hand-crafted approaches to define the structural units, we use the data from nine training videos to learn the parameters of the HHMM, and thus naturally extract the hierarchy. We then study this hierarchy and examine the nature of the structure at different levels of abstraction. Since the observable is continuous, we also show how to extend the parameter learning in the HHMM to deal with continuous observations

    Efficient duration modelling in the hierarchical hidden semi-Markov models and their applications

    Get PDF
    Modeling patterns in temporal data has arisen as an important problem in engineering and science. This has led to the popularity of several dynamic models, in particular the renowned hidden Markov model (HMM) [Rabiner, 1989]. Despite its widespread success in many cases, the standard HMM often fails to model more complex data whose elements are correlated hierarchically or over a long period. Such problems are, however, frequently encountered in practice. Existing efforts to overcome this weakness often address either one of these two aspects separately, mainly due to computational intractability. Motivated by this modeling challenge in many real world problems, in particular, for video surveillance and segmentation, this thesis aims to develop tractable probabilistic models that can jointly model duration and hierarchical information in a unified framework. We believe that jointly exploiting statistical strength from both properties will lead to more accurate and robust models for the needed task. To tackle the modeling aspect, we base our work on an intersection between dynamic graphical models and statistics of lifetime modeling. Realizing that the key bottleneck found in the existing works lies in the choice of the distribution for a state, we have successfully integrated the discrete Coxian distribution [Cox, 1955], a special class of phase-type distributions, into the HMM to form a novel and powerful stochastic model termed as the Coxian Hidden Semi-Markov Model (CxHSMM). We show that this model can still be expressed as a dynamic Bayesian network, and inference and learning can be derived analytically.Most importantly, it has four superior features over existing semi-Markov modelling: the parameter space is compact, computation is fast (almost the same as the HMM), close-formed estimation can be derived, and the Coxian is flexible enough to approximate a large class of distributions. Next, we exploit hierarchical decomposition in the data by borrowing analogy from the hierarchical hidden Markov model in [Fine et al., 1998, Bui et al., 2004] and introduce a new type of shallow structured graphical model that combines both duration and hierarchical modelling into a unified framework, termed the Coxian Switching Hidden Semi-Markov Models (CxSHSMM). The top layer is a Markov sequence of switching variables, while the bottom layer is a sequence of concatenated CxHSMMs whose parameters are determined by the switching variable at the top. Again, we provide a thorough analysis along with inference and learning machinery. We also show that semi-Markov models with arbitrary depth structure can easily be developed. In all cases we further address two practical issues: missing observations to unstable tracking and the use of partially labelled data to improve training accuracy. Motivated by real-world problems, our application contribution is a framework to recognize complex activities of daily livings (ADLs) and detect anomalies to provide better intelligent caring services for the elderly.Coarser activities with self duration distributions are represented using the CxHSMM. Complex activities are made of a sequence of coarser activities and represented at the top level in the CxSHSMM. Intensive experiments are conducted to evaluate our solutions against existing methods. In many cases, the superiority of the joint modeling and the Coxian parameterization over traditional methods is confirmed. The robustness of our proposed models is further demonstrated in a series of more challenging experiments, in which the tracking is often lost and activities considerably overlap. Our final contribution is an application of the switching Coxian model to segment education-oriented videos into coherent topical units. Our results again demonstrate such segmentation processes can benefit greatly from the joint modeling of duration and hierarchy

    A Virtual Conversational Agent for Teens with Autism: Experimental Results and Design Lessons

    Full text link
    We present the design of an online social skills development interface for teenagers with autism spectrum disorder (ASD). The interface is intended to enable private conversation practice anywhere, anytime using a web-browser. Users converse informally with a virtual agent, receiving feedback on nonverbal cues in real-time, and summary feedback. The prototype was developed in consultation with an expert UX designer, two psychologists, and a pediatrician. Using the data from 47 individuals, feedback and dialogue generation were automated using a hidden Markov model and a schema-driven dialogue manager capable of handling multi-topic conversations. We conducted a study with nine high-functioning ASD teenagers. Through a thematic analysis of post-experiment interviews, identified several key design considerations, notably: 1) Users should be fully briefed at the outset about the purpose and limitations of the system, to avoid unrealistic expectations. 2) An interface should incorporate positive acknowledgment of behavior change. 3) Realistic appearance of a virtual agent and responsiveness are important in engaging users. 4) Conversation personalization, for instance in prompting laconic users for more input and reciprocal questions, would help the teenagers engage for longer terms and increase the system's utility

    A Functional Taxonomy of Music Generation Systems

    Get PDF
    Digital advances have transformed the face of automatic music generation since its beginnings at the dawn of computing. Despite the many breakthroughs, issues such as the musical tasks targeted by different machines and the degree to which they succeed remain open questions. We present a functional taxonomy for music generation systems with reference to existing systems. The taxonomy organizes systems according to the purposes for which they were designed. It also reveals the inter-relatedness amongst the systems. This design-centered approach contrasts with predominant methods-based surveys and facilitates the identification of grand challenges to set the stage for new breakthroughs.Comment: survey, music generation, taxonomy, functional survey, survey, automatic composition, algorithmic compositio

    Automated manipulation of musical grammars to support episodic interactive experiences

    Get PDF
    Music is used to enhance the experience of participants and visitors in a range of settings including theatre, film, video games, installations and theme parks. These experiences may be interactive, contrastingly episodic and with variable duration. Hence, the musical accompaniment needs to be dynamic and to transition between contrasting music passages. In these contexts, computer generation of music may be necessary for practical reasons including distribution and cost. Automated and dynamic composition algorithms exist but are not well-suited to a highly interactive episodic context owing to transition-related problems including discontinuity, abruptness, extended repetitiveness and lack of musical granularity and musical form. Addressing these problems requires algorithms capable of reacting to participant behaviour and episodic change in order to generate formic music that is continuous and coherent during transitions. This thesis presents the Form-Aware Transitioning and Recovering Algorithm (FATRA) for realtime, adaptive, form-aware music generation to provide continuous musical accompaniment in episodic context. FATRA combines stochastic grammar adaptation and grammar merging in real time. The Form-Aware Transition Engine (FATE) implementation of FATRA estimates the time-occurrence of upcoming narrative transitions and generates a harmonic sequence as narrative accompaniment with a focus on coherent, form-aware music transitioning between music passages of contrasting character. Using FATE, FATRA has been evaluated in three perceptual user studies: An audioaugmented real museum experience, a computer-simulated museum experience and a music-focused online study detached from narrative. Music transitions of FATRA were benchmarked against common approaches of the video game industry, i.e. crossfading and direct transitions. The participants were overall content with the music of FATE during their experience. Transitions of FATE were significantly favoured against the crossfading benchmark and competitive against the direct transitions benchmark, without statistical significance for the latter comparison. In addition, technical evaluation demonstrated capabilities of FATRA including form generation, repetitiveness avoidance and style/form recovery in case of falsely predicted narrative transitions. Technical results along with perceptual preference and competitiveness against the benchmark approaches are deemed as positive and the structural advantages of FATRA, including form-aware transitioning, carry considerable potential for future research

    E3: Emotions, Engagement, and Educational Digital Games

    Get PDF
    The use of educational digital games as a method of instruction for science, technology, engineering, and mathematics has increased in the past decade. While these games provide successfully implemented interactive and fun interfaces, they are not designed to respond or remedy students’ negative affect towards the game dynamics or their educational content. Therefore, this exploratory study investigated the frequent patterns of student emotional and behavioral response to educational digital games. To unveil the sequential occurrence of these affective states, students were assigned to play the game for nine class sessions. During these sessions, their affective and behavioral response was recorded to uncover possible underlying patterns of affect (particularly confusion, frustration, and boredom) and behavior (disengagement). In addition, these affect and behavior frequency pattern data were combined with students’ gameplay data in order to identify patterns of emotions that led to a better performance in the game. The results provide information on possible affect and behavior patterns that could be used in further research on affect and behavior detection in such open-ended digital game environments. Particularly, the findings show that students experience a considerable amount of confusion, frustration, and boredom. Another finding highlights the need for remediation via embedded help, as the students referred to peer help often during their gameplay. However, possibly because of the low quality of the received help, students seemed to become frustrated or disengaged with the environment. Finally, the findings suggest the importance of the decay rate of confusion; students’ gameplay performance was associated with the length of time students remained confused or frustrated. Overall, these findings show that there are interesting patterns related to students who experience relatively negative emotions during their gameplay

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl
    • …
    corecore