217 research outputs found

    A Novel Electrocardiogram Segmentation Algorithm Using a Multiple Model Adaptive Estimator

    Get PDF
    This thesis presents a novel electrocardiogram (ECG) processing algorithm design based on a Multiple Model Adaptive Estimator (MMAE) for a physiological monitoring system. Twenty ECG signals from the MIT ECG database were used to develop system models for the MMAE. The P-wave, QRS complex, and T-wave segments from the characteristic ECG waveform were used to develop hypothesis filter banks. By adding a threshold filter-switching algorithm to the conventional MMAE implementation, the device mimics the way a human analyzer searches the complex ECG signal for a useable temporal landmark and then branches out to find the other key wave components and their timing. The twenty signals and an additional signal from an animal exsanuinaiton experiment were then used to test the algorithm. Using a conditional hypothesis-testing algorithm, the MMAE correctly identified the ECG signal segments corresponding to the hypothesis models with a 96.8% accuracy-rate for the 11539 possible segments tested. The robust MMAE algorithm also detected any misalignments in the filter hypotheses and automatically restarted filters within the MMAE to synchronize the hypotheses with the incoming signal. Finally, the MMAE selects the optimal filter bank based on incoming ECG measurements. The algorithm also provides critical heart-related information such as heart rate, QT, and PR intervals from the ECG signal. This analyzer could be easily added as a software update to the standard physiological monitors universally used in emergency vehicles and treatment facilities and potentially saving thousands of lives and reducing the pain and suffering of the injured

    Modeling and predicting emotion in music

    Get PDF
    With the explosion of vast and easily-accessible digital music libraries over the past decade, there has been a rapid expansion of research towards automated systems for searching and organizing music and related data. Online retailers now offer vast collections of music, spanning tens of millions of songs, available for immediate download. While these online stores present a drastically different dynamic than the record stores of the past, consumers still arrive with the same requests recommendation of music that is similar to their tastes; for both recommendation and curation, the vast digital music libraries of today necessarily require powerful automated tools.The medium of music has evolved speci cally for the expression of emotions, and it is natural for us to organize music in terms of its emotional associations. But while such organization is a natural process for humans, quantifying it empirically proves to be a very difficult task. Myriad features, such as harmony, timbre, interpretation, and lyrics affect emotion, and the mood of a piece may also change over its duration. Furthermore, in developing automated systems to organize music in terms of emotional content, we are faced with a problem that oftentimes lacks a well-defined answer; there may be considerable disagreement regarding the perception and interpretation of the emotions of a song or even ambiguity within the piece itself.Automatic identi cation of musical mood is a topic still in its early stages, though it has received increasing attention in recent years. Such work offers potential not just to revolutionize how we buy and listen to our music, but to provide deeper insight into the understanding of human emotions in general. This work seeks to relate core concepts from psychology to that of signal processing to understand how to extract information relevant to musical emotion from an acoustic signal. The methods discussed here survey existing features using psychology studies and develop new features using basis functions learned directly from magnitude spectra. Furthermore, this work presents a wide breadth of approaches in developing functional mappings between acoustic data and emotion space parameters. Using these models, a framework is constructed for content-based modeling and prediction of musical emotion.Ph.D., Electrical Engineering -- Drexel University, 201

    Multimodal Video Analysis and Modeling

    Get PDF
    From recalling long forgotten experiences based on a familiar scent or on a piece of music, to lip reading aided conversation in noisy environments or travel sickness caused by mismatch of the signals from vision and the vestibular system, the human perception manifests countless examples of subtle and effortless joint adoption of the multiple senses provided to us by evolution. Emulating such multisensory (or multimodal, i.e., comprising multiple types of input modes or modalities) processing computationally offers tools for more effective, efficient, or robust accomplishment of many multimedia tasks using evidence from the multiple input modalities. Information from the modalities can also be analyzed for patterns and connections across them, opening up interesting applications not feasible with a single modality, such as prediction of some aspects of one modality based on another. In this dissertation, multimodal analysis techniques are applied to selected video tasks with accompanying modalities. More specifically, all the tasks involve some type of analysis of videos recorded by non-professional videographers using mobile devices.Fusion of information from multiple modalities is applied to recording environment classification from video and audio as well as to sport type classification from a set of multi-device videos, corresponding audio, and recording device motion sensor data. The environment classification combines support vector machine (SVM) classifiers trained on various global visual low-level features with audio event histogram based environment classification using k nearest neighbors (k-NN). Rule-based fusion schemes with genetic algorithm (GA)-optimized modality weights are compared to training a SVM classifier to perform the multimodal fusion. A comprehensive selection of fusion strategies is compared for the task of classifying the sport type of a set of recordings from a common event. These include fusion prior to, simultaneously with, and after classification; various approaches for using modality quality estimates; and fusing soft confidence scores as well as crisp single-class predictions. Additionally, different strategies are examined for aggregating the decisions of single videos to a collective prediction from the set of videos recorded concurrently with multiple devices. In both tasks multimodal analysis shows clear advantage over separate classification of the modalities.Another part of the work investigates cross-modal pattern analysis and audio-based video editing. This study examines the feasibility of automatically timing shot cuts of multi-camera concert recordings according to music-related cutting patterns learnt from professional concert videos. Cut timing is a crucial part of automated creation of multicamera mashups, where shots from multiple recording devices from a common event are alternated with the aim at mimicing a professionally produced video. In the framework, separate statistical models are formed for typical patterns of beat-quantized cuts in short segments, differences in beats between consecutive cuts, and relative deviation of cuts from exact beat times. Based on music meter and audio change point analysis of a new recording, the models can be used for synthesizing cut times. In a user study the proposed framework clearly outperforms a baseline automatic method with comparably advanced audio analysis and wins 48.2 % of comparisons against hand-edited videos

    Creating a real-time movement sonification system for hemiparetic upper limb rehabilitation for survivors of stroke

    Get PDF
    Upper limb paresis is a common problem for survivors of stroke, impeding their ability to live independently, and rehabilitation interventions to reduce impairment are highly sought after. The use of audio-based interventions, such as movement sonification, may improve rehabilitation outcomes in this application, however, they are relatively unexplored considering the potential that audio feedback has to enhance motor skill learning. Movement sonification is the process of converting movement associated data to the auditory domain and is touted to be a feasible and effective method for stroke survivors to obtain real-time audio feedback of their movements. To generate real-time audio feedback through movement sonification, a system is required to capture movements, process data, extract the physical domain of interest, convert to the auditory domain, and emit the generated audio. A commercial system that performs this process for gross upper limb movements is currently unavailable, therefore, system creation is required. To begin this process, a mapping review of movement sonification systems in the literature was completed. System components in the literature were identified, keyword coded, and grouped, to provide an overview of the components used within these systems. From these results, choices for components of new movement sonification systems were made based on the popularity and applicability, to create two movement sonification systems, one termed ‘Soniccup’, which uses an Inertial Measurement Unit, and the other termed ‘KinectSon’ which uses an Azure Kinect camera. Both systems were setup to translate position estimates into audio pitch, as an output of the sonification process. Both systems were subsequently used in a comparison study with a Vicon Nexus system to establish similarity of positional shape, and therefore establish audio output similarity. The results indicate that the Soniccup produced positional shape representative of the movement performed, for movements of duration under one second, but performance degraded as the movement duration increased. In addition, the Soniccup produced these results with a system latency of approximately 230 ms, which is beyond the limit of real-time perception. The KinectSon system was found to produce similar positional shape to the Vicon Nexus system for all movements, and obtained these results with a system latency of approximately 67 ms, which is within the limit of real-time perception. As such, the KinectSon system has been evaluated as a good candidate for generating real-time audio feedback, however further testing is required to identify suitability of the generated audio feedback. To evaluate the feedback, as part of usability testing, the KinectSon system was used in an agency study. Volunteers with and without upper-limb impairment performed reaching movements whilst using the KinectSon system, and reported the perceived association of the sound generated with the movements performed. For three of the four sonification conditions, a triangular wave pitch modulation component was added to distort the sound. The participants in this study associated their movements with the unmodulated sonification condition stronger than they did with the modulated sonification conditions, indicating that stroke survivors are able to use the KinectSon system and obtain a sense of agency whilst using the system. The thesis concludes with a discussion of the findings of the contributing chapters of this thesis, along with the implications, limitations, and identified future work, within the context of creating a suitable real-time movement sonification system for a large scale study involving an upper limb rehabilitation intervention.Upper limb paresis is a common problem for survivors of stroke, impeding their ability to live independently, and rehabilitation interventions to reduce impairment are highly sought after. The use of audio-based interventions, such as movement sonification, may improve rehabilitation outcomes in this application, however, they are relatively unexplored considering the potential that audio feedback has to enhance motor skill learning. Movement sonification is the process of converting movement associated data to the auditory domain and is touted to be a feasible and effective method for stroke survivors to obtain real-time audio feedback of their movements. To generate real-time audio feedback through movement sonification, a system is required to capture movements, process data, extract the physical domain of interest, convert to the auditory domain, and emit the generated audio. A commercial system that performs this process for gross upper limb movements is currently unavailable, therefore, system creation is required. To begin this process, a mapping review of movement sonification systems in the literature was completed. System components in the literature were identified, keyword coded, and grouped, to provide an overview of the components used within these systems. From these results, choices for components of new movement sonification systems were made based on the popularity and applicability, to create two movement sonification systems, one termed ‘Soniccup’, which uses an Inertial Measurement Unit, and the other termed ‘KinectSon’ which uses an Azure Kinect camera. Both systems were setup to translate position estimates into audio pitch, as an output of the sonification process. Both systems were subsequently used in a comparison study with a Vicon Nexus system to establish similarity of positional shape, and therefore establish audio output similarity. The results indicate that the Soniccup produced positional shape representative of the movement performed, for movements of duration under one second, but performance degraded as the movement duration increased. In addition, the Soniccup produced these results with a system latency of approximately 230 ms, which is beyond the limit of real-time perception. The KinectSon system was found to produce similar positional shape to the Vicon Nexus system for all movements, and obtained these results with a system latency of approximately 67 ms, which is within the limit of real-time perception. As such, the KinectSon system has been evaluated as a good candidate for generating real-time audio feedback, however further testing is required to identify suitability of the generated audio feedback. To evaluate the feedback, as part of usability testing, the KinectSon system was used in an agency study. Volunteers with and without upper-limb impairment performed reaching movements whilst using the KinectSon system, and reported the perceived association of the sound generated with the movements performed. For three of the four sonification conditions, a triangular wave pitch modulation component was added to distort the sound. The participants in this study associated their movements with the unmodulated sonification condition stronger than they did with the modulated sonification conditions, indicating that stroke survivors are able to use the KinectSon system and obtain a sense of agency whilst using the system. The thesis concludes with a discussion of the findings of the contributing chapters of this thesis, along with the implications, limitations, and identified future work, within the context of creating a suitable real-time movement sonification system for a large scale study involving an upper limb rehabilitation intervention

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity
    corecore