278 research outputs found

    End-to-end speech recognition modeling from de-identified data

    Full text link
    De-identification of data used for automatic speech recognition modeling is a critical component in protecting privacy, especially in the medical domain. However, simply removing all personally identifiable information (PII) from end-to-end model training data leads to a significant performance degradation in particular for the recognition of names, dates, locations, and words from similar categories. We propose and evaluate a two-step method for partially recovering this loss. First, PII is identified, and each occurrence is replaced with a random word sequence of the same category. Then, corresponding audio is produced via text-to-speech or by splicing together matching audio fragments extracted from the corpus. These artificial audio/label pairs, together with speaker turns from the original data without PII, are used to train models. We evaluate the performance of this method on in-house data of medical conversations and observe a recovery of almost the entire performance degradation in the general word error rate while still maintaining a strong diarization performance. Our main focus is the improvement of recall and precision in the recognition of PII-related words. Depending on the PII category, between 50%90%50\% - 90\% of the performance degradation can be recovered using our proposed method.Comment: Accepted to INTERSPEECH 202

    Tuberculosis in Children and Adolescents, Taiwan, 1996–2003

    Get PDF
    Analysis of data from Taiwan’s National Tuberculosis (TB) Registry showed that incidence of TB in persons <20 years of age was 9.61/100,000 person-years, biphasic, and age-relevant, with a major peak in persons slightly >12 years. Aboriginal children were 8.1–17.4× more likely to have TB than non-Aboriginal children

    Detecting light leptophilic gauge boson at BESIII detector

    Get PDF
    The O(GeV) O(GeV) extra U(1) U(1) gauge boson named U-boson, has been proposed to mediate the interaction among leptons and dark matter (DM), in order to account for the observations by PAMELA and ATIC. In such kind of models, the extra U(1) gauge group can be chosen as U(1)LiLjU(1)_{L_i-L_j} with LiL_i the ii-th generation lepton number. This anomaly-free model provides appropriate dark matter relic density and boost factor required by experiments. In this work the observability of such kind of U-boson at BESIII detector is investigated through the processes e+eUγ e^ + e^ - \to U\gamma, followed by Ue+eU\to e^+e^-, Uμ+μU\to \mu^+\mu^- and UννU\to \nu\overline{\nu}. In the invisible channel where U-boson decays into neutrinos, BESIII can measure the coupling of the extra U(1) U(1) down to O(104)O(105) O(10^{- 4}) \sim O(10^{- 5}) because of the low Standard Model backgrounds. In the visible channel where U-boson decays into charged lepton pair, BESIII can only measure the coupling down to O(103)O(104) O(10^{- 3}) \sim O(10^{- 4}) due to the large irreducible QED backgrounds.Comment: 7pages, 9figures; V2: SecIII corrected, discussions adde

    Efficient models of intrinsic variability in speech recognition and speech therapy

    No full text
    The objective of this thesis study is to develop statistical modeling techniques for characterizing phonetic variation in automatic speech recognition (ASR). One issue addressed in this domain is to reliably detect the phoneme level mispronunciations in speech utterances that arise from speech therapy applications. Another issue addressed in this work is to study the ability of ASR systems to model the phonetic variation that often exists in speaker-independent recognition tasks. Both issues will be treated as examples of the same basic problem in robustly modeling phonetic variability in ASR.The technical contributions involved in this thesis which address these issues are presented as follows. First, a phoneme level pronunciation verification (PV) scenario is investigated for detecting the mispronunciation occurrences in speech utterances recorded from a population of impaired children with neuromuscular disorders. The well known continuous density hidden Markov model (CDHMM) is used as a phoneme decoder which generates a finite state network of phoneme string hypotheses for input speech utterances.The phoneme level confidence measures can be constructed from this network, and PV decision can be made by comparing the confidence measures with a pre-selected threshold. Some well known state-of-the-art ASR techniques are incorporated in this PV scenario, and the experimental studies show how these techniques can impact the verification accuracy.Second, the subspace Gaussian mixture model (SGMM) formalism is investigated. This acoustic model is shown to provide an efficient model of phonetic variability in speech. In the experimental studies, it can be shown that a 18.74% relative reduction in word error rate with respect to the well known CDHMM acoustic model can be achieved on a medium vocabulary ASR task. Furthermore, it is demonstrated that a 24.79% relative reduction in phone error rate with respect to the CDHMM can be achieved for an unimpaired children speech corpus.Finally, the SGMM is incorporated into a new PV scenario. A new kind of pronunciation confidence measure used for making mispronunciation verification decisions is extracted directly from the state level model parameters. Both session level and utterance level PV scenarios based on the SGMM based confidence measures are proposed. In the session level PV task, the equal error rate can be reduced by 15.35% when combining the SGMM based confidence measures with the above phoneme decoder based confidence measures. In the utterance level PV task, the equal error rate can be reduced by 12.94%. This equal error rate reduction is believed to result from an efficient characterization of pronunciation variation for each phoneme by the SGMM.L'objectif de cette thèse est de développer des techniques de modélisation statistique pour caractériser la variabilité phonétique dans la reconnaissance automatique de la parole (RAP). Un problème adressé est la détection fiable des erreurs de prononciation, à l'échelle du phonème, dans des énoncés provenant d'applications de thérapie de la parole. Cette thèse inclut également une étude de l'habilitée des systèmes de reconnaissance à modéliser la variabilité phonétique présente dans les applications indépendantes du locuteur. Ces deux problèmes sont traités comme des occurrences du même problème fondamental qu'est la modélisation robuste de la variabilité phonétique dans la RAP.Les contributions techniques de cette thèse sont présentées comme suit. Premièrement, une investigation d'un scénario de vérification de la prononciation (VP), à l'échelle du phonème, dont l'objectif est la détection d'erreurs de prononciation dans des d'énoncés produit par une population d'enfants qui ont un désordre neuro-musculaire. Le familier modèle de Markov caché à densité continue (CDHMM) est utilisé comme décodeur. Un réseau d'états finis est utilisé pour représenter les séquences de phonèmes possibles pour chaque énoncé. Une mesure de confiance, à l'échelle du phonème, peut être construite pour chaque réseau, et peut ensuite être comparée à un seuil pré-sélectionné et ainsi produire une décision de la VP. Des techniques issues de la fine pointe de la RAP sont incorporées dans ce scénario de la VP, et les résultats expérimentaux montrent l'influence de ces techniques sur la précision de la VP.Deuxièmement, le formalisme du modèle de mélange de Gaussiennes à sous-espaces (SGMM) est étudié. Il a été démontré que ce modèle acoustique fournit une modélisation efficace de la variabilité phonémique dans la parole. Une réduction relative du taux d'erreur par mots de 18.74%, comparativement au CDHMM, a été obtenue pour une application de RAP à vocabulaire de taille moyenne. De plus, il est démontré qu'une réduction relative du taux d'erreur par phonème, comparativement au CDHMM, de 24.79% peut être obtenu pour un corpus d'énoncés d'enfant sans troubles de la parole.Finalement, le SGMM est incorporé dans un nouveau scénario de la VP. Une nouvelle mesure de confiance, utilisée pour la VP, est extraite directement des paramètres du modèle, au niveau de l'état. Des scénarios de la VP, basés sur l'application de cette mesure de confiance à des sessions et des énoncés, sont proposés. Pour le scénario par session, le niveau d'erreurs-égales peut être réduit par 15.35% en combinant la mesure de confiance SGMM à celle dérivée des hypothèses du CDHMM décrite plus haut. Pour le scénario par énoncé, la réduction équivalente est de 12.94%. Ces réductions semblent provenir de la caractérisation efficace de la variabilité dans la prononciation de chaque phonème par le SGMM

    Speaker adaptation in joint factor analysis based text independent speaker verification

    No full text
    This thesis presents methods for supervised and unsupervised speaker adaptation of Gaussian mixture speaker models in text-independent speaker verification. The proposed methods are based on an approach which is able to separate speaker and channel variability so that progressive updating of speaker models can be performed while minimizing the influence of the channel variability associated with the adaptation recordings. This approach relies on a joint factor analysis model of intrinsic speaker variability and session variability where inter-session variation is assumed to result primarily from the effects of the transmission channel. These adaptation methods have been evaluated under the adaptation paradigm defined under the NIST 2005 speaker recognition evaluation plan which is based on conversational telephone speech

    A study of pronunciation verification in a speech therapy application

    No full text
    Techniques are presented for detecting phoneme level mispro-nunciations in utterances obtained from a population of impaired children speakers. The intended application of these approaches is to use the resulting confidence measures to provide feedback to patients concerning the quality of pronunciations in utterances arising within interactive speech therapy sessions. The pronunciation verification scenario involves presenting utterances of known words to a pho-netic decoder and generating confusion networks from the resulting phone lattices. Confidence measures are derived from the posterior probabilities obtained from the confusion networks. Phoneme level mispronunciation detection performance was significantly improved with respect to a baseline system by optimizing acoustic models and pronunciation models in the phonetic decoder and applying a non-linear mapping to the confusion network posteriors. Index Terms — confidence measure, speech therapy 1

    Concomitant upregulation of nuclear factor-kB activity, proinflammatory cytokines and ICAM-1 in the injured brain after cortical contusion trauma in a rat model

    No full text
    Background: Nuclear factor kappa B (NF-kB), proinflammatory cytokines and intercellular adhesion molecule 1 (ICAM-1) are frequently upregulated in the injured brain after traumatic brain injury (TBI). However, the temporal pattern of upregulation is not well defined. Aims: The current study was undertaken to investigate the temporal profile of the expression of NF-kB, proinflammatory cytokines and ICAM-1 in the injured brain after cortical contusion trauma of the rat brain. Settings and Design: A rat model of cortical contusion was produced by a free-falling weight on the exposed dura of right parietal lobe. The rats were randomly divided into control group and TBI groups at hours 3, 12, 24 and 72, and on day 7. Material and Methods: NF-kB binding activity in the surrounding brain of injured area was studied by electrophoretic mobility shift assay (EMSA). The levels of TNF-α and IL-6 were detected using ELISA and ICAM-1 expression studied by immunohistochemistry. Statistical analysis: The data were analyzed by one-way ANOVA followed by Student-Newman-Keuls post hoc test. Relation between variables was analyzed using bivariate correlation with two-tailed test. Results: Compared with that of control group, NF-kB binding activity in the injured brain was significantly increased through 12 h and 7 days postinjury, with the maximum at 72 h. The concentrations of TNF-α and IL-6 in the injured brain were significantly increased from 3 h to 7 days and maximal at 24 h postinjury. The number of ICAM-1 immunostained microvessels was significantly increased in the injured brain from 24 h to 7 days postinjury, with its peak at 72 h. Concomitant upregulation of TNF-α, IL-6, ICAM-1 and the cytokine mediators NF-kB in the injured brain was observed in the injured brain after cortical contusion, and there was a highly positive relation among these variables. Conclusions: Cortical contusion trauma could induce a concomitant and persistent upregulation of NF-kB binding activity, TNF-α, IL-6 and ICAM-1 in the injured rat brain which might play a central role in the injury-induced immune response of brain
    corecore