8,798 research outputs found

    Speech Recognition Technology: Improving Speed and Accuracy of Emergency Medical Services Documentation to Protect Patients

    Get PDF
    Because hospital errors, such as mistakes in documentation, cause one in six deaths each year in the United States, the accuracy of health records in the emergency medical services (EMS) must be improved. One possible solution is to incorporate speech recognition (SR) software into current tools used by EMS first responders. The purpose of this research was to determine if SR software could increase the efficiency and accuracy of EMS documentation to improve the safety of patients of EMS. An initial review of the literature on the performance of current SR software demonstrated that this software was not 99% accurate, and therefore, errors in the medical documentation produced by the software could harm patients. The literature review also identified weaknesses of SR software that could be overcome so that the software would be accurate enough for use in EMS settings. These weaknesses included the inability to differentiate between similar phrases and the inability to filter out background noise. To find a solution, an analysis of natural language processing algorithms showed that the bag-of-words post processing algorithm has the ability to differentiate between similar phrases. This algorithm is best suited for SR applications because it is simple yet effective compared to machine learning algorithms that required a large amount of training data. The findings suggested that if these weaknesses of current SR software are solved, then the software would potentially increase the efficiency and accuracy of EMS documentation. Further studies should integrate the bag-of-words post processing method into SR software and field test its accuracy in EMS settings

    Time Domain Computation of a Nonlinear Nonlocal Cochlear Model with Applications to Multitone Interaction in Hearing

    Full text link
    A nonlinear nonlocal cochlear model of the transmission line type is studied in order to capture the multitone interactions and resulting tonal suppression effects. The model can serve as a module for voice signal processing, it is a one dimensional (in space) damped dispersive nonlinear PDE based on mechanics and phenomenology of hearing. It describes the motion of basilar membrane (BM) in the cochlea driven by input pressure waves. Both elastic damping and selective longitudinal fluid damping are present. The former is nonlinear and nonlocal in BM displacement, and plays a key role in capturing tonal interactions. The latter is active only near the exit boundary (helicotrema), and is built in to damp out the remaining long waves. The initial boundary value problem is numerically solved with a semi-implicit second order finite difference method. Solutions reach a multi-frequency quasi-steady state. Numerical results are shown on two tone suppression from both high-frequency and low-frequency sides, consistent with known behavior of two tone suppression. Suppression effects among three tones are demonstrated by showing how the response magnitudes of the fixed two tones are reduced as we vary the third tone in frequency and amplitude. We observe qualitative agreement of our model solutions with existing cat auditory neural data. The model is thus simple and efficient as a processing tool for voice signals.Comment: 23 pages,7 figures; added reference

    Speech recognition enhancement using beamforming and a genetic algorithm

    Get PDF
    This paper proposes a genetic algorithm (GA) based beamformer to optimize speech recognition accuracy for a pretrained speech recognizer. The proposed beamformer is designed to tackle the non-differentiable and non-linear natures of speech recognition by employing the GA algorithm to search for the optimal beamformer weights. Specifically, a population of beamformer weights is reproduced by crossover and mutation until the optimal beamformer weights are obtained. Results show that the speech recognition accuracies can be greatly improved even in noisy environments

    Error in the Superior Temporal Gyrus? A Systematic Review and Activation Likelihood Estimation Meta-Analysis of Speech Production Studies

    Get PDF
    Evidence for perceptual processing in models of speech production is often drawn from investigations in which the sound of a talker's voice is altered in real time to induce “errors.” Methods of acoustic manipulation vary but are assumed to engage the same neural network and psychological processes. This paper aims to review fMRI and PET studies of altered auditory feedback and assess the strength of the evidence these studies provide for a speech error correction mechanism. Studies included were functional neuroimaging studies of speech production in neurotypical adult humans, using natural speech errors or one of three predefined speech manipulation techniques (frequency altered feedback, delayed auditory feedback, and masked auditory feedback). Seventeen studies met the inclusion criteria. In a systematic review, we evaluated whether each study (1) used an ecologically valid speech production task, (2) controlled for auditory activation caused by hearing the perturbation, (3) statistically controlled for multiple comparisons, and (4) measured behavioral compensation correlating with perturbation. None of the studies met all four criteria. We then conducted an activation likelihood estimation meta-analysis of brain coordinates from 16 studies that reported brain responses to manipulated over unmanipulated speech feedback, using the GingerALE toolbox. These foci clustered in bilateral superior temporal gyri, anterior to cortical fields typically linked to error correction. Within the limits of our analysis, we conclude that existing neuroimaging evidence is insufficient to determine whether error monitoring occurs in the posterior superior temporal gyrus regions proposed by models of speech production

    Cardiac rhythm analysis during ongoing cardiopulmonary resuscitation using the Analysis During Compressions with Fast reconfirmation technology

    Get PDF
    BACKGROUND Pauses in chest compressions (CCs) have a negative association with survival from cardiac arrest. Electrocardiographic (ECG) rhythm analysis and defibrillator charging are significant contributors to CC pauses. OBJECTIVE Accuracy of the Analysis During Compressions with Fast Reconfirmation (ADC-FR) algorithm, which features automated rhythm analysis and charging during CCs to reduce CC pauses, was retrospectively determined in a large database of ECGs from 2701 patients with out-of-hospital cardiac arrest. METHODS The ADC-FR algorithm generated a total of 7264 advisories, of which 3575 were randomly assigned to a development data set and 3689 to a test data set. With ADC-FR, a high-pass digital filter is used to remove CC artifacts, while the underlying ECG rhythm is automatically interpreted. When CCs are paused at the end of the 2-minute cardiopulmonary resuscitation interval, a 3-second reconfirmation analysis is performed using the artifact-free ECG to confirm the shock/no-shock advisory. The sensitivity and specificity of the ADC-FR algorithm in correctly identifying shockable/nonshockable rhythms during CCs were calculated. RESULTS In both data sets, the accuracy of the ADC-FR algorithm for each ECG rhythm exceeded the recommended performance goals, which apply to a standard artifact-free ECG analysis. Sensitivity and specificity were 97% and 99%, respectively, for the development data set and 95% and 99% for the test data set. CONCLUSION The ADC-FR algorithm is highly accurate in discriminating shockable and nonshockable rhythms and can be used to reduce CC pauses

    Language Detoxification with Attribute-Discriminative Latent Space

    Full text link
    Transformer-based Language Models (LMs) have achieved impressive results on natural language understanding tasks, but they can also generate toxic text such as insults, threats, and profanity, limiting their real-world applications. To overcome this issue, a few text generation approaches aim to detoxify toxic texts using additional LMs or perturbations. However, previous methods require excessive memory, computations, and time which are serious bottlenecks in their real-world application. To address such limitations, we propose an effective yet efficient method for language detoxification using an attribute-discriminative latent space. Specifically, we project the latent space of an original Transformer LM onto a discriminative latent space that well-separates texts by their attributes using a projection block and an attribute discriminator. This allows the LM to control the text generation to be non-toxic with minimal memory and computation overhead. We validate our model, Attribute-Discriminative Language Model (ADLM) on detoxified language and dialogue generation tasks, on which our method significantly outperforms baselines both in performance and efficiency.Comment: ACL 2023; *Equal contribution. Author ordering determined by coin fli

    FinBTech: Blockchain-Based Video and Voice Authentication System for Enhanced Security in Financial Transactions Utilizing FaceNet512 and Gaussian Mixture Models

    Full text link
    In the digital age, it is crucial to make sure that financial transactions are as secure and reliable as possible. This abstract offers a ground-breaking method that combines smart contracts, blockchain technology, FaceNet512 for improved face recognition, and Gaussian Mixture Models (GMM) for speech authentication to create a system for video and audio verification that is unmatched. Smart contracts and the immutable ledger of the blockchain are combined to offer a safe and open environment for financial transactions. FaceNet512 and GMM offer multi-factor biometric authentication simultaneously, enhancing security to new heights. By combining cutting-edge technology, this system offers a strong defense against identity theft and illegal access, establishing a new benchmark for safe financial transactions

    Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

    Get PDF
    A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept

    Trigger and data acquisition

    Full text link
    The lectures address some of the issues of triggering and data acquisition in large high-energy physics experiments. Emphasis is placed on hadron-collider experiments that present a particularly challenging environment for event selection and data collection. However, the lectures also explain how T/DAQ systems have evolved over the years to meet new challenges. Some examples are given from early experience with LHC T/DAQ systems during the 2008 single-beam operations.Comment: 32 pages, Lectures given at the 5th CERN-Latin-American School of High-Energy Physics, Recinto Quirama, Colombia, 15 - 28 Mar 200
    • …
    corecore