Search CORE

278 research outputs found

End-to-end speech recognition modeling from de-identified data

Author: Flechl Martin
Park Junho
Skala Peter
Yin Shou-Chun
Publication venue
Publication date: 12/07/2022
Field of study

De-identification of data used for automatic speech recognition modeling is a critical component in protecting privacy, especially in the medical domain. However, simply removing all personally identifiable information (PII) from end-to-end model training data leads to a significant performance degradation in particular for the recognition of names, dates, locations, and words from similar categories. We propose and evaluate a two-step method for partially recovering this loss. First, PII is identified, and each occurrence is replaced with a random word sequence of the same category. Then, corresponding audio is produced via text-to-speech or by splicing together matching audio fragments extracted from the corpus. These artificial audio/label pairs, together with speaker turns from the original data without PII, are used to train models. We evaluate the performance of this method on in-house data of medical conversations and observe a recovery of almost the entire performance degradation in the general word error rate while still maintaining a strong diarization performance. Our main focus is the improvement of recall and precision in the recognition of PII-related words. Depending on the PII category, between

50\% - 90\%

of the performance degradation can be recovered using our proposed method.Comment: Accepted to INTERSPEECH 202

arXiv.org e-Print Archive

Tuberculosis in Children and Adolescents, Taiwan, 1996–2003

Author: Chiang
Chin-Yun Lee
Chun-Yi Lu
Howie
Hsiang-Lin Yang
Hsu
I-Shou Chang
Li-Min Huang
Lin
Luan-Yin Chang
Nelson
Pei-Chun Chan
Ping-Ing Lee
Yang
Yi-Chun Wu
Yu
Publication venue: Centers for Disease Control and Prevention
Publication date: 01/09/2007
Field of study

Analysis of data from Taiwan’s National Tuberculosis (TB) Registry showed that incidence of TB in persons <20 years of age was 9.61/100,000 person-years, biphasic, and age-relevant, with a major peak in persons slightly >12 years. Aboriginal children were 8.1–17.4× more likely to have TB than non-Aboriginal children

Crossref

Directory of Open Access Journals

PubMed Central

Tools and Technologies for Computer-Aided Speech and Language Therapy

Author: Carlos Vaquero
Deller
Delong
Dempster
Eduardo Lleida
Gauvain
Legetter
Lleida
Oscar Saz
Patel
Pratt
Rabiner
Richard Rose
Shou-Chun Yin
William R. Rodríguez
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Detecting light leptophilic gauge boson at BESIII detector

Author: Abdallah
Abdo
Adriani
Adriani
Aharonian
Allen
Allen
Arkani-Hamed
Arkani-Hamed
Asner
Baek
Baek
Baer
Batell
Baumgart
Bennett
Bergstrom
Bernabei
Bi
Boehm
Boehm
Borodatchenkova
Bouchiat
Bovy
Chang
Chen
Cholis
Chun
Cirelli
Cirelli
Cui
Desai
Desai
Essig
Fayet
Fayet
Fayet
Fayet
Fayet
Foot
Fox
Gninenko
Gninenko
Grajek
Hanneke
He
He
Hisano
Hisano
Hisano
Hooper
Hooper
Iengo
Jia Liu
Jungman
Kohri
Liu
March-Russell
March-Russell
Meade
Peng-fei Yin
Pieri
Pospelov
Pospelov
Pukhov
Reece
Shirai
Shou-hua Zhu
Yuksel
Zhu
Publication venue: 'Elsevier BV'
Publication date: 14/07/2009
Field of study

The

O(GeV)

extra

U(1)

gauge boson named U-boson, has been proposed to mediate the interaction among leptons and dark matter (DM), in order to account for the observations by PAMELA and ATIC. In such kind of models, the extra U(1) gauge group can be chosen as

U(1)_{L_i-L_j}

with

L_i

the

i-

th generation lepton number. This anomaly-free model provides appropriate dark matter relic density and boost factor required by experiments. In this work the observability of such kind of U-boson at BESIII detector is investigated through the processes

e^ + e^ - \to U\gamma

, followed by

U\to e^+e^-

U\to \mu^+\mu^-

and

U\to \nu\overline{\nu}

. In the invisible channel where U-boson decays into neutrinos, BESIII can measure the coupling of the extra

U(1)

down to

O(10^{- 4}) \sim O(10^{- 5})

because of the low Standard Model backgrounds. In the visible channel where U-boson decays into charged lepton pair, BESIII can only measure the coupling down to

O(10^{- 3}) \sim O(10^{- 4})

due to the large irreducible QED backgrounds.Comment: 7pages, 9figures; V2: SecIII corrected, discussions adde

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Efficient models of intrinsic variability in speech recognition and speech therapy

Author: Yin Shou-Chun
Publication venue: McGill University
Publication date
Field of study

The objective of this thesis study is to develop statistical modeling techniques for characterizing phonetic variation in automatic speech recognition (ASR). One issue addressed in this domain is to reliably detect the phoneme level mispronunciations in speech utterances that arise from speech therapy applications. Another issue addressed in this work is to study the ability of ASR systems to model the phonetic variation that often exists in speaker-independent recognition tasks. Both issues will be treated as examples of the same basic problem in robustly modeling phonetic variability in ASR.The technical contributions involved in this thesis which address these issues are presented as follows. First, a phoneme level pronunciation verification (PV) scenario is investigated for detecting the mispronunciation occurrences in speech utterances recorded from a population of impaired children with neuromuscular disorders. The well known continuous density hidden Markov model (CDHMM) is used as a phoneme decoder which generates a finite state network of phoneme string hypotheses for input speech utterances.The phoneme level confidence measures can be constructed from this network, and PV decision can be made by comparing the confidence measures with a pre-selected threshold. Some well known state-of-the-art ASR techniques are incorporated in this PV scenario, and the experimental studies show how these techniques can impact the verification accuracy.Second, the subspace Gaussian mixture model (SGMM) formalism is investigated. This acoustic model is shown to provide an efficient model of phonetic variability in speech. In the experimental studies, it can be shown that a 18.74% relative reduction in word error rate with respect to the well known CDHMM acoustic model can be achieved on a medium vocabulary ASR task. Furthermore, it is demonstrated that a 24.79% relative reduction in phone error rate with respect to the CDHMM can be achieved for an unimpaired children speech corpus.Finally, the SGMM is incorporated into a new PV scenario. A new kind of pronunciation confidence measure used for making mispronunciation verification decisions is extracted directly from the state level model parameters. Both session level and utterance level PV scenarios based on the SGMM based confidence measures are proposed. In the session level PV task, the equal error rate can be reduced by 15.35% when combining the SGMM based confidence measures with the above phoneme decoder based confidence measures. In the utterance level PV task, the equal error rate can be reduced by 12.94%. This equal error rate reduction is believed to result from an efficient characterization of pronunciation variation for each phoneme by the SGMM.L'objectif de cette thèse est de développer des techniques de modélisation statistique pour caractériser la variabilité phonétique dans la reconnaissance automatique de la parole (RAP). Un problème adressé est la détection fiable des erreurs de prononciation, à l'échelle du phonème, dans des énoncés provenant d'applications de thérapie de la parole. Cette thèse inclut également une étude de l'habilitée des systèmes de reconnaissance à modéliser la variabilité phonétique présente dans les applications indépendantes du locuteur. Ces deux problèmes sont traités comme des occurrences du même problème fondamental qu'est la modélisation robuste de la variabilité phonétique dans la RAP.Les contributions techniques de cette thèse sont présentées comme suit. Premièrement, une investigation d'un scénario de vérification de la prononciation (VP), à l'échelle du phonème, dont l'objectif est la détection d'erreurs de prononciation dans des d'énoncés produit par une population d'enfants qui ont un désordre neuro-musculaire. Le familier modèle de Markov caché à densité continue (CDHMM) est utilisé comme décodeur. Un réseau d'états finis est utilisé pour représenter les séquences de phonèmes possibles pour chaque énoncé. Une mesure de confiance, à l'échelle du phonème, peut être construite pour chaque réseau, et peut ensuite être comparée à un seuil pré-sélectionné et ainsi produire une décision de la VP. Des techniques issues de la fine pointe de la RAP sont incorporées dans ce scénario de la VP, et les résultats expérimentaux montrent l'influence de ces techniques sur la précision de la VP.Deuxièmement, le formalisme du modèle de mélange de Gaussiennes à sous-espaces (SGMM) est étudié. Il a été démontré que ce modèle acoustique fournit une modélisation efficace de la variabilité phonémique dans la parole. Une réduction relative du taux d'erreur par mots de 18.74%, comparativement au CDHMM, a été obtenue pour une application de RAP à vocabulaire de taille moyenne. De plus, il est démontré qu'une réduction relative du taux d'erreur par phonème, comparativement au CDHMM, de 24.79% peut être obtenu pour un corpus d'énoncés d'enfant sans troubles de la parole.Finalement, le SGMM est incorporé dans un nouveau scénario de la VP. Une nouvelle mesure de confiance, utilisée pour la VP, est extraite directement des paramètres du modèle, au niveau de l'état. Des scénarios de la VP, basés sur l'application de cette mesure de confiance à des sessions et des énoncés, sont proposés. Pour le scénario par session, le niveau d'erreurs-égales peut être réduit par 15.35% en combinant la mesure de confiance SGMM à celle dérivée des hypothèses du CDHMM décrite plus haut. Pour le scénario par énoncé, la réduction équivalente est de 12.94%. Ces réductions semblent provenir de la caractérisation efficace de la variabilité dans la prononciation de chaque phonème par le SGMM

eScholarship@McGill

Speaker adaptation in joint factor analysis based text independent speaker verification

Author: Shou-Chun Yin, 1980-
Publication venue: McGill University
Publication date
Field of study

This thesis presents methods for supervised and unsupervised speaker adaptation of Gaussian mixture speaker models in text-independent speaker verification. The proposed methods are based on an approach which is able to separate speaker and channel variability so that progressive updating of speaker models can be performed while minimizing the influence of the channel variability associated with the adaptation recordings. This approach relies on a joint factor analysis model of intrinsic speaker variability and session variability where inter-session variation is assumed to result primarily from the effects of the transmission channel. These adaptation methods have been evaluated under the adaptation paradigm defined under the NIST 2005 speaker recognition evaluation plan which is based on conversational telephone speech

eScholarship@McGill

A study of pronunciation verification in a speech therapy application

Author: Eduardo Lleida
Oscar Saz
Richard Rose
Shou-chun Yin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Techniques are presented for detecting phoneme level mispro-nunciations in utterances obtained from a population of impaired children speakers. The intended application of these approaches is to use the resulting confidence measures to provide feedback to patients concerning the quality of pronunciations in utterances arising within interactive speech therapy sessions. The pronunciation verification scenario involves presenting utterances of known words to a pho-netic decoder and generating confusion networks from the resulting phone lattices. Confidence measures are derived from the posterior probabilities obtained from the confusion networks. Phoneme level mispronunciation detection performance was significantly improved with respect to a baseline system by optimizing acoustic models and pronunciation models in the phonetic decoder and applying a non-linear mapping to the confusion network posteriors. Index Terms — confidence measure, speech therapy 1

CiteSeerX

Crossref

Concomitant upregulation of nuclear factor-kB activity, proinflammatory cytokines and ICAM-1 in the injured brain after cortical contusion trauma in a rat model

Author: Chun Hua Hang
Hong Xia Yin
Ji-Xin Shi
Jie-Shou Li
Wei Wu
Publication venue: Medknow Publications on behalf of the Neurological Society of India
Publication date: 31/12/2005
Field of study

Background: Nuclear factor kappa B (NF-kB), proinflammatory cytokines and intercellular adhesion molecule 1 (ICAM-1) are frequently upregulated in the injured brain after traumatic brain injury (TBI). However, the temporal pattern of upregulation is not well defined. Aims: The current study was undertaken to investigate the temporal profile of the expression of NF-kB, proinflammatory cytokines and ICAM-1 in the injured brain after cortical contusion trauma of the rat brain. Settings and Design: A rat model of cortical contusion was produced by a free-falling weight on the exposed dura of right parietal lobe. The rats were randomly divided into control group and TBI groups at hours 3, 12, 24 and 72, and on day 7. Material and Methods: NF-kB binding activity in the surrounding brain of injured area was studied by electrophoretic mobility shift assay (EMSA). The levels of TNF-α and IL-6 were detected using ELISA and ICAM-1 expression studied by immunohistochemistry. Statistical analysis: The data were analyzed by one-way ANOVA followed by Student-Newman-Keuls post hoc test. Relation between variables was analyzed using bivariate correlation with two-tailed test. Results: Compared with that of control group, NF-kB binding activity in the injured brain was significantly increased through 12 h and 7 days postinjury, with the maximum at 72 h. The concentrations of TNF-α and IL-6 in the injured brain were significantly increased from 3 h to 7 days and maximal at 24 h postinjury. The number of ICAM-1 immunostained microvessels was significantly increased in the injured brain from 24 h to 7 days postinjury, with its peak at 72 h. Concomitant upregulation of TNF-α, IL-6, ICAM-1 and the cytokine mediators NF-kB in the injured brain was observed in the injured brain after cortical contusion, and there was a highly positive relation among these variables. Conclusions: Cortical contusion trauma could induce a concomitant and persistent upregulation of NF-kB binding activity, TNF-α, IL-6 and ICAM-1 in the injured rat brain which might play a central role in the injury-induced immune response of brain

University of Toronto Research Repository