3,731 research outputs found

    Speech data analysis for semantic indexing of video of simulated medical crises.

    Get PDF
    The Simulation for Pediatric Assessment, Resuscitation, and Communication (SPARC) group within the Department of Pediatrics at the University of Louisville, was established to enhance the care of children by using simulation based educational methodologies to improve patient safety and strengthen clinician-patient interactions. After each simulation session, the physician must manually review and annotate the recordings and then debrief the trainees. The physician responsible for the simulation has recorded 100s of videos, and is seeking solutions that can automate the process. This dissertation introduces our developed system for efficient segmentation and semantic indexing of videos of medical simulations using machine learning methods. It provides the physician with automated tools to review important sections of the simulation by identifying who spoke, when and what was his/her emotion. Only audio information is extracted and analyzed because the quality of the image recording is low and the visual environment is static for most parts. Our proposed system includes four main components: preprocessing, speaker segmentation, speaker identification, and emotion recognition. The preprocessing consists of first extracting the audio component from the video recording. Then, extracting various low-level audio features to detect and remove silence segments. We investigate and compare two different approaches for this task. The first one is threshold-based and the second one is classification-based. The second main component of the proposed system consists of detecting speaker changing points for the purpose of segmenting the audio stream. We propose two fusion methods for this task. The speaker identification and emotion recognition components of our system are designed to provide users the capability to browse the video and retrieve shots that identify ”who spoke, when, and the speaker’s emotion” for further analysis. For this component, we propose two feature representation methods that map audio segments of arbitary length to a feature vector with fixed dimensions. The first one is based on soft bag-of-word (BoW) feature representations. In particular, we define three types of BoW that are based on crisp, fuzzy, and possibilistic voting. The second feature representation is a generalization of the BoW and is based on Fisher Vector (FV). FV uses the Fisher Kernel principle and combines the benefits of generative and discriminative approaches. The proposed feature representations are used within two learning frameworks. The first one is supervised learning and assumes that a large collection of labeled training data is available. Within this framework, we use standard classifiers including K-nearest neighbor (K-NN), support vector machine (SVM), and Naive Bayes. The second framework is based on semi-supervised learning where only a limited amount of labeled training samples are available. We use an approach that is based on label propagation. Our proposed algorithms were evaluated using 15 medical simulation sessions. The results were analyzed and compared to those obtained using state-of-the-art algorithms. We show that our proposed speech segmentation fusion algorithms and feature mappings outperform existing methods. We also integrated all proposed algorithms and developed a GUI prototype system for subjective evaluation. This prototype processes medical simulation video and provides the user with a visual summary of the different speech segments. It also allows the user to browse videos and retrieve scenes that provide answers to semantic queries such as: who spoke and when; who interrupted who? and what was the emotion of the speaker? The GUI prototype can also provide summary statistics of each simulation video. Examples include: for how long did each person spoke? What is the longest uninterrupted speech segment? Is there an unusual large number of pauses within the speech segment of a given speaker

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization

    Get PDF
    In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coef?cients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR)

    An application of an auditory periphery model in speaker identification

    Get PDF
    The number of applications of automatic Speaker Identification (SID) is growing due to the advanced technologies for secure access and authentication in services and devices. In 2016, in a study, the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlear model achieved the best performance among seven recent cochlear models to fit a set of human auditory physiological data. Motivated by the performance of the CAR-FAC, I apply this cochlear model in an SID task for the first time to produce a similar performance to a human auditory system. This thesis investigates the potential of the CAR-FAC model in an SID task. I investigate the capability of the CAR-FAC in text-dependent and text-independent SID tasks. This thesis also investigates contributions of different parameters, nonlinearities, and stages of the CAR-FAC that enhance SID accuracy. The performance of the CAR-FAC is compared with another recent cochlear model called the Auditory Nerve (AN) model. In addition, three FFT-based auditory features – Mel frequency Cepstral Coefficient (MFCC), Frequency Domain Linear Prediction (FDLP), and Gammatone Frequency Cepstral Coefficient (GFCC), are also included to compare their performance with cochlear features. This comparison allows me to investigate a better front-end for a noise-robust SID system. Three different statistical classifiers: a Gaussian Mixture Model with Universal Background Model (GMM-UBM), a Support Vector Machine (SVM), and an I-vector were used to evaluate the performance. These statistical classifiers allow me to investigate nonlinearities in the cochlear front-ends. The performance is evaluated under clean and noisy conditions for a wide range of noise levels. Techniques to improve the performance of a cochlear algorithm are also investigated in this thesis. It was found that the application of a cube root and DCT on cochlear output enhances the SID accuracy substantially

    Evaluation of a Patient-Specific, Low-Cost, 3-Dimensional–Printed Transesophageal Echocardiography Human Heart Phantom

    Get PDF
    Simulation based education has been shown to increase the task-specific capability of medical trainees. Transesophageal echocardiography training greatly benefits from the use of simulators. They allow real time scanning of a beating heart and generation of ultrasound images side by side with anatomically accurate virtual model. These simulators are costly and have many limitations. 3D printing technologies have enabled the creation of bespoke phantoms capable of being used as task-trainers. This study aims to compare the ease of use and accuracy of a low-cost patient-specific, Computer-tomography based, 3D printed, echogenic TEE phantom compared to a commercially available echocardiography training mannequin. We hypothesized that a low-cost, 3D printed custom-made, cardiac phantom has comparable image quality, accuracy and usability as existing commercially available echocardiographic phantoms. After Institutional Ethic Research Board approval, we recruited ten American Board – Certified cardiac anesthesiologists and conducted a blinded comparative study divided into two stages. Stage one consisted of image assessment. A set of basic TEE views obtained from the 3D printed and commercial phantom were presented to the participants on a computer screen in random order. For each image, participants will be asked to identify the view, identify the quality of the image on a 1-5 Likert scale compared to the corresponding human view and guess with which phantom it was acquired (1 not at all realistic to patients view and 5 realistic to patients view). Stage two, participants will be asked to use the 3D printed and the commercially available phantom to obtain basic TEE views. In a maximum of 30 minutes. Each view was recorded and assessed for accuracy by two certified echocardiographers. Time needed to acquire each basic view and number of correct views was recorded. Overall usability of the phantoms was assessed through a questionnaire. For all continuous variables, we will calculate mean, median and standard deviation. We use Wilcoxon Signed-Rank test to assess significant differences in the rating of each phantom. All ten participants completed all part of the study. All participants could recognize all of the standard views. The average Likert scale was 3.2 for the 3D printed and 2.9 for the commercial Phantom with no significant difference. The average time to obtain views was 24.5 and 30 sec for the 3D printed and the commercial phantoms respectively statistically significantly in favor of the 3D printed phantom. The qualitative user assessment for ease to obtain the views, probe manipulation, image quality and overall experience were in great favor of the 3D printed phantom. Our Study suggest that the quality of TEE images obtained on the 3D printed phantom are not significantly different from those obtained on the commercial Phantom. The ease of use and time required to complete a basic TEE exam were in favor of the 3D Printed phantom.:Table of Content 1. Bibliographic Description 3 2. Introduction 4 2.1. Perioperative transesophageal echocardiography 4 2.2. Transesophageal echocardiography training 5 2.3. Transesophageal echocardiography simulation 6 2.4. 3D Heart Printing 13 2.5. 3D Segmentation 16 2.6. Development of the study phantom 17 2.7. Study Rationale 18 3. Publication 22 4. Summary 30 5. References 33 6. Appendices 37 6.1. Darstellung des eigenes Beitrags 38 6.2. Erklärung über die eigenständige Abfassung der Arbeit 39 6.3. Lebenslauf 40 6.4. Publikationen und Vorträge 44 6.5. Danksagung 61

    Acta Cybernetica : Volume 25. Number 2.

    Get PDF
    • …
    corecore