278 research outputs found
End-to-end speech recognition modeling from de-identified data
De-identification of data used for automatic speech recognition modeling is a
critical component in protecting privacy, especially in the medical domain.
However, simply removing all personally identifiable information (PII) from
end-to-end model training data leads to a significant performance degradation
in particular for the recognition of names, dates, locations, and words from
similar categories. We propose and evaluate a two-step method for partially
recovering this loss. First, PII is identified, and each occurrence is replaced
with a random word sequence of the same category. Then, corresponding audio is
produced via text-to-speech or by splicing together matching audio fragments
extracted from the corpus. These artificial audio/label pairs, together with
speaker turns from the original data without PII, are used to train models. We
evaluate the performance of this method on in-house data of medical
conversations and observe a recovery of almost the entire performance
degradation in the general word error rate while still maintaining a strong
diarization performance. Our main focus is the improvement of recall and
precision in the recognition of PII-related words. Depending on the PII
category, between of the performance degradation can be recovered
using our proposed method.Comment: Accepted to INTERSPEECH 202
Tuberculosis in Children and Adolescents, Taiwan, 1996–2003
Analysis of data from Taiwan’s National Tuberculosis (TB) Registry showed that incidence of TB in persons <20 years of age was 9.61/100,000 person-years, biphasic, and age-relevant, with a major peak in persons slightly >12 years. Aboriginal children were 8.1–17.4× more likely to have TB than non-Aboriginal children
Detecting light leptophilic gauge boson at BESIII detector
The extra gauge boson named U-boson, has been proposed to
mediate the interaction among leptons and dark matter (DM), in order to account
for the observations by PAMELA and ATIC. In such kind of models, the extra U(1)
gauge group can be chosen as with the th generation
lepton number. This anomaly-free model provides appropriate dark matter relic
density and boost factor required by experiments. In this work the
observability of such kind of U-boson at BESIII detector is investigated
through the processes , followed by ,
and . In the invisible channel where
U-boson decays into neutrinos, BESIII can measure the coupling of the extra down to because of the low Standard Model
backgrounds. In the visible channel where U-boson decays into charged lepton
pair, BESIII can only measure the coupling down to due to the large irreducible QED backgrounds.Comment: 7pages, 9figures; V2: SecIII corrected, discussions adde
Efficient models of intrinsic variability in speech recognition and speech therapy
The objective of this thesis study is to develop statistical modeling techniques for characterizing phonetic variation in automatic speech recognition (ASR). One issue addressed in this domain is to reliably detect the phoneme level mispronunciations in speech utterances that arise from speech therapy applications. Another issue addressed in this work is to study the ability of ASR systems to model the phonetic variation that often exists in speaker-independent recognition tasks. Both issues will be treated as examples of the same basic problem in robustly modeling phonetic variability in ASR.The technical contributions involved in this thesis which address these issues are presented as follows. First, a phoneme level pronunciation verification (PV) scenario is investigated for detecting the mispronunciation occurrences in speech utterances recorded from a population of impaired children with neuromuscular disorders. The well known continuous density hidden Markov model (CDHMM) is used as a phoneme decoder which generates a finite state network of phoneme string hypotheses for input speech utterances.The phoneme level confidence measures can be constructed from this network, and PV decision can be made by comparing the confidence measures with a pre-selected threshold. Some well known state-of-the-art ASR techniques are incorporated in this PV scenario, and the experimental studies show how these techniques can impact the verification accuracy.Second, the subspace Gaussian mixture model (SGMM) formalism is investigated. This acoustic model is shown to provide an efficient model of phonetic variability in speech. In the experimental studies, it can be shown that a 18.74% relative reduction in word error rate with respect to the well known CDHMM acoustic model can be achieved on a medium vocabulary ASR task. Furthermore, it is demonstrated that a 24.79% relative reduction in phone error rate with respect to the CDHMM can be achieved for an unimpaired children speech corpus.Finally, the SGMM is incorporated into a new PV scenario. A new kind of pronunciation confidence measure used for making mispronunciation verification decisions is extracted directly from the state level model parameters. Both session level and utterance level PV scenarios based on the SGMM based confidence measures are proposed. In the session level PV task, the equal error rate can be reduced by 15.35% when combining the SGMM based confidence measures with the above phoneme decoder based confidence measures. In the utterance level PV task, the equal error rate can be reduced by 12.94%. This equal error rate reduction is believed to result from an efficient characterization of pronunciation variation for each phoneme by the SGMM.L'objectif de cette thèse est de développer des techniques de modélisation statistique pour caractériser la variabilité phonétique dans la reconnaissance automatique de la parole (RAP). Un problème adressé est la détection fiable des erreurs de prononciation, à l'échelle du phonème, dans des énoncés provenant d'applications de thérapie de la parole. Cette thèse inclut également une étude de l'habilitée des systèmes de reconnaissance à modéliser la variabilité phonétique présente dans les applications indépendantes du locuteur. Ces deux problèmes sont traités comme des occurrences du même problème fondamental qu'est la modélisation robuste de la variabilité phonétique dans la RAP.Les contributions techniques de cette thèse sont présentées comme suit. Premièrement, une investigation d'un scénario de vérification de la prononciation (VP), à l'échelle du phonème, dont l'objectif est la détection d'erreurs de prononciation dans des d'énoncés produit par une population d'enfants qui ont un désordre neuro-musculaire. Le familier modèle de Markov caché à densité continue (CDHMM) est utilisé comme décodeur. Un réseau d'états finis est utilisé pour représenter les séquences de phonèmes possibles pour chaque énoncé. Une mesure de confiance, à l'échelle du phonème, peut être construite pour chaque réseau, et peut ensuite être comparée à un seuil pré-sélectionné et ainsi produire une décision de la VP. Des techniques issues de la fine pointe de la RAP sont incorporées dans ce scénario de la VP, et les résultats expérimentaux montrent l'influence de ces techniques sur la précision de la VP.Deuxièmement, le formalisme du modèle de mélange de Gaussiennes à sous-espaces (SGMM) est étudié. Il a été démontré que ce modèle acoustique fournit une modélisation efficace de la variabilité phonémique dans la parole. Une réduction relative du taux d'erreur par mots de 18.74%, comparativement au CDHMM, a été obtenue pour une application de RAP à vocabulaire de taille moyenne. De plus, il est démontré qu'une réduction relative du taux d'erreur par phonème, comparativement au CDHMM, de 24.79% peut être obtenu pour un corpus d'énoncés d'enfant sans troubles de la parole.Finalement, le SGMM est incorporé dans un nouveau scénario de la VP. Une nouvelle mesure de confiance, utilisée pour la VP, est extraite directement des paramètres du modèle, au niveau de l'état. Des scénarios de la VP, basés sur l'application de cette mesure de confiance à des sessions et des énoncés, sont proposés. Pour le scénario par session, le niveau d'erreurs-égales peut être réduit par 15.35% en combinant la mesure de confiance SGMM à celle dérivée des hypothèses du CDHMM décrite plus haut. Pour le scénario par énoncé, la réduction équivalente est de 12.94%. Ces réductions semblent provenir de la caractérisation efficace de la variabilité dans la prononciation de chaque phonème par le SGMM
Speaker adaptation in joint factor analysis based text independent speaker verification
This thesis presents methods for supervised and unsupervised speaker adaptation of Gaussian mixture speaker models in text-independent speaker verification. The proposed methods are based on an approach which is able to separate speaker and channel variability so that progressive updating of speaker models can be performed while minimizing the influence of the channel variability associated with the adaptation recordings. This approach relies on a joint factor analysis model of intrinsic speaker variability and session variability where inter-session variation is assumed to result primarily from the effects of the transmission channel. These adaptation methods have been evaluated under the adaptation paradigm defined under the NIST 2005 speaker recognition evaluation plan which is based on conversational telephone speech
A study of pronunciation verification in a speech therapy application
Techniques are presented for detecting phoneme level mispro-nunciations in utterances obtained from a population of impaired children speakers. The intended application of these approaches is to use the resulting confidence measures to provide feedback to patients concerning the quality of pronunciations in utterances arising within interactive speech therapy sessions. The pronunciation verification scenario involves presenting utterances of known words to a pho-netic decoder and generating confusion networks from the resulting phone lattices. Confidence measures are derived from the posterior probabilities obtained from the confusion networks. Phoneme level mispronunciation detection performance was significantly improved with respect to a baseline system by optimizing acoustic models and pronunciation models in the phonetic decoder and applying a non-linear mapping to the confusion network posteriors. Index Terms — confidence measure, speech therapy 1
Concomitant upregulation of nuclear factor-kB activity, proinflammatory cytokines and ICAM-1 in the injured brain after cortical contusion trauma in a rat model
Background: Nuclear factor kappa B (NF-kB), proinflammatory cytokines
and intercellular adhesion molecule 1 (ICAM-1) are frequently
upregulated in the injured brain after traumatic brain injury (TBI).
However, the temporal pattern of upregulation is not well defined.
Aims: The current study was undertaken to investigate the temporal
profile of the expression of NF-kB, proinflammatory cytokines and
ICAM-1 in the injured brain after cortical contusion trauma of the rat
brain. Settings and Design: A rat model of cortical contusion was
produced by a free-falling weight on the exposed dura of right parietal
lobe. The rats were randomly divided into control group and TBI groups
at hours 3, 12, 24 and 72, and on day 7. Material and Methods: NF-kB
binding activity in the surrounding brain of injured area was studied
by electrophoretic mobility shift assay (EMSA). The levels of
TNF-α and IL-6 were detected using ELISA and ICAM-1 expression
studied by immunohistochemistry. Statistical analysis: The data were
analyzed by one-way ANOVA followed by Student-Newman-Keuls post hoc
test. Relation between variables was analyzed using bivariate
correlation with two-tailed test. Results: Compared with that of
control group, NF-kB binding activity in the injured brain was
significantly increased through 12 h and 7 days postinjury, with the
maximum at 72 h. The concentrations of TNF-α and IL-6 in the
injured brain were significantly increased from 3 h to 7 days and
maximal at 24 h postinjury. The number of ICAM-1 immunostained
microvessels was significantly increased in the injured brain from 24 h
to 7 days postinjury, with its peak at 72 h. Concomitant upregulation
of TNF-α, IL-6, ICAM-1 and the cytokine mediators NF-kB in the
injured brain was observed in the injured brain after cortical
contusion, and there was a highly positive relation among these
variables. Conclusions: Cortical contusion trauma could induce a
concomitant and persistent upregulation of NF-kB binding activity,
TNF-α, IL-6 and ICAM-1 in the injured rat brain which might play a
central role in the injury-induced immune response of brain
- …