206 research outputs found

    On adaptive decision rules and decision parameter adaptation for automatic speech recognition

    Get PDF
    Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing.published_or_final_versio

    Scalable learning for geostatistics and speaker recognition

    Get PDF
    With improved data acquisition methods, the amount of data that is being collected has increased severalfold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. This thesis focuses on developing scalable and effective methods targeted towards different domains, geostatistics and speaker recognition in particular. Initially we focus on kernel based learning methods and develop a GPU based parallel framework for this class of problems. An improved numerical algorithm that utilizes the GPU parallelization to further enhance the computational performance of kernel regression is proposed. These methods are then demonstrated on problems arising in geostatistics and speaker recognition. In geostatistics, data is often collected at scattered locations and factors like instrument malfunctioning lead to missing observations. Applications often require the ability interpolate this scattered spatiotemporal data on to a regular grid continuously over time. This problem can be formulated as a regression problem, and one of the most popular geostatistical interpolation techniques, kriging is analogous to a standard kernel method: Gaussian process regression. Kriging is computationally expensive and needs major modifications and accelerations in order to be used practically. The GPU framework developed for kernel methods is extended to kriging and further the GPU's texture memory is better utilized for enhanced computational performance. Speaker recognition deals with the task of verifying a person's identity based on samples of his/her speech - "utterances". This thesis focuses on text-independent framework and three new recognition frameworks were developed for this problem. We proposed a kernelized Renyi distance based similarity scoring for speaker recognition. While its performance is promising, it does not generalize well for limited training data and therefore does not compare well to state-of-the-art recognition systems. These systems compensate for the variability in the speech data due to the message, channel variability, noise and reverberation. State-of-the-art systems model each speaker as a mixture of Gaussians (GMM) and compensate for the variability (termed "nuisance"). We propose a novel discriminative framework using a latent variable technique, partial least squares (PLS), for improved recognition. The kernelized version of this algorithm is used to achieve a state of the art speaker ID system, that shows results competitive with the best systems reported on in NIST's 2010 Speaker Recognition Evaluation

    ๋‹ค๋‹จ๊ณ„ ์šฐ๋„๋ฅผ ์ด์šฉํ•œ ์ž„์ƒ ์•ฝ๋ฆฌ ์ž๋ฃŒ ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ†ต๊ณ„ํ•™๊ณผ,2020. 2. ์ด์˜์กฐ.1996๋…„ Lee์™€ Nelder๊ฐ€ ์ œ์•ˆํ•œ ๋‹ค๋‹จ๊ณ„์šฐ๋„ (H-likelihood) ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๋ฐ์ด ํ„ฐ์˜ ๋ถ„์„์— ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ํŠนํžˆ ํด๋Ÿฌ์Šคํ„ฐ ์•ˆ์—์„œ ๋ฐ˜๋ณต ์ธก์ •๋œ ๋ฐ์ดํ„ฐ๋Š” ๋‹ค๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ์„ ํ˜•๋ชจํ˜•(HGLM)์„ ํ†ตํ•˜์—ฌ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ๋Š” ๋‹ค์ค‘ ์ž๋ฃŒ๋ฅผ ๋ถ„ ์„ํ•˜๊ณ ์ž ํ•  ๋•Œ๋Š” ๋‹ค๋ณ€๋Ÿ‰ ์ด์ค‘ ๋‹ค๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ์„ ํ˜•๋ชจํ˜• (multivariate double HGLM) ์„ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ์‹œํ—˜์ œํ’ˆ๊ณผ ๋Œ€์กฐ์ œํ’ˆ์‚ฌ์ด์˜ ์•ฝ๋™ํ•™์  ์œ ์‚ฌ์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ์ƒ๋ฌผํ•™์  ๋™๋“ฑ์„ฑ ๊ฒ€์ •์— ๋‹ค๋ณ€๋Ÿ‰ ์ด์ค‘ ๋‹ค๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ์„ ํ˜•๋ชจํ˜•์„ ์ ์šฉํ•˜์˜€๋‹ค. ๋งŒ์•ฝ ๋Œ€์กฐ์ œํ’ˆ ๋Œ€๋น„ ์‹œํ—˜์ œํ’ˆ์˜ AUC์™€ C max ์˜ ๊ธฐํ•˜ํ‰๊ท ๋น„์˜ 90% ์‹ ๋ขฐ๊ตฌ๊ฐ„์ด ์ƒ๋ฌผํ•™์  ๋™๋“ฑ์„ฑ ๋งˆ์ง„์ธ (0.8, 1.25) ๋ฒ”์œ„์— ํฌํ•จ๋œ๋‹ค๋ฉด, ์‹œํ—˜์ œํ’ˆ์€ ์ƒ๋ฌผํ•™์ ์œผ๋กœ ๋™๋“ฑํ•˜๋‹ค๊ณ  ํŒ๋‹จ ๋œ๋‹ค. ๋‘ ๊ฐ€์ง€ ์ผ์ฐจ๋ณ€์ˆ˜์ธ AUC์™€ C max ์‚ฌ์ด์— ์„œ๋กœ ๊ฐ•ํ•œ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ๋‹ค๋ฉด, ๋‹ค๋ณ€๋Ÿ‰ ์ด์ค‘ ๋‹ค๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ์„ ํ˜•๋ชจํ˜•์„ ์ด์šฉํ•  ๋•Œ ์ถ”์ •๋œ ์ฒ˜์น˜ ํšจ๊ณผ์— ๋Œ€ํ•œ ํ‘œ์ค€์˜ค์ฐจ๊ฐ€ ๋”์ž‘์•„์ง€๊ณ , ๋”ฐ๋ผ์„œ ๊ธฐํ•˜ํ‰๊ท ๋น„์˜ 90% ์‹ ๋ขฐ๊ตฌ๊ฐ„์ด ๋” ์ข์•„์ง€๋Š” ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค. ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ์ค‘์—์„œ ์ตœ์ ํ•ฉ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์šฐ๋ฆฌ๋Š” ์ด์ค‘ ๋‹ค๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ์„ ํ˜•๋ชจํ˜•์— ๋Œ€ํ•œ conditional Akaike information(cAI)์„ ์ •์˜ํ•˜๊ณ , ์ด์ค‘ ๋‹ค๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ์„ ํ˜•๋ชจํ˜•์˜ effective degree of freedom์„ ์ด์šฉํ•˜์—ฌ cAI์— ๋Œ€ํ•œ ์ ๊ทผ์  ๋ถˆํŽธ ์ถ”์ •๋Ÿ‰์ธ contional Akaike informaiton criterion (cAIC)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ธฐ์กด์˜ cAIC ์™€ ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ cAIC์˜ ์ •ํ™•๋„์™€ ์ตœ์  ๋ชจ๋ธ ์„ ํƒ ์ˆ˜ํ–‰๋ ฅ์„ ๋น„๊ตํ•˜๊ณ , ์ด๋ฅผ ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์—ฌ ์ตœ์ ๋ชจ๋ธ ์„ ํƒ์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค.H-likelihood approach proposed by Lee and Nelder (1996) is widely used for various data. In particular, repeated measured data within clusters can be analyzed by hierarchical generalized linear models (HGLMs). When we are interested in the multiple endpoints which are correlated, then multivariate double hierarchical generalized linear models (multivariate double HGLMs) can be considered. In this thesis, we apply multivariate double hierarchical generalized linear models for bioequivalence testing which is performed to assess the similarity in the pharmacokinetic profiles between a test product and its reference product. If the 90% confidence interval for the geometric mean ratio (GMR) of a test to the reference product entirely falls within the bioequivalence margin, (0.8, 1.25), for both AUC and C max , the test product is declared to be bioequivalent. Since two co-primary endpoints AUC and C max are strongly correlated, we consider multivariate double HGLMs which provide smaller standard errors of estimated treatment effects and resulted in narrower 90% confidence interval for GMR. To select the best fitting model among different model classes, we define conditional Akaike information for double hierarchical generalized linear models (double HGLMs) and propose its asymptotically unbiased estimator, conditional Akaike information criterion (cAIC), using effective degree of freedom for double HGLMs. We compare the accuracy and model selection performance of the proposed cAIC with conventional cAIC, and apply it to the real data for the best model selection.1 Introduction 1 2 Multivariate double hierarchical generalized linear models for clinical pharmacology data 3 2.1 Backgrounds 3 2.2 Motivating examples 7 2.2.1 Example 1: Tramadol data 7 2.2.2 Example 2: Fimasartan data 9 2.3 Multivariate double hierarchical generalized linear models and h-likelihood theory 11 2.3.1 Multivariate double hierarchical generalized linear models 11 2.3.2 H-likelihood estimation procedure 13 2.4 Application for clinical pharmacology data 16 2.4.1 Tramadol data 16 2.4.2 Fimasartan data 17 3 Conditional Akaike information for double hierarchical generalized linear models 21 3.1 Literature review 21 3.1.1 Effective degree of freedom 22 3.1.2 Conditional Akaike information criterion 23 3.1.3 Other Akaike information criteria 25 3.2 Conditional Akaike information for double hierarchical generalized linear model 26 3.2.1 h-likelihood inference 26 3.2.2 Conditional Akaike information criterion for double hierarchical generalized linear model 31 3.3 Numerical studies and applications 34 3.3.1 Simulations 34 3.3.2 Application: tramadol data 37 3.3.3 Application: Fimasartan data 37 4 Concluding remarks 43 Appendices 44 Abstract (In Korean) 52Docto

    Semiparametric Estimation and Inference with Mis-measured, Correlated or Mixed Observations, and the Application in Ecology, Medicine and Neurology

    Get PDF
    The dissertation considers semiparametric regression models inspired by statistical problems in ecological, medical and neurological studies. In those models, the interest is usually on the estimation of a set of finite parameters with difficulties of handling some unknown distribution functions or some other unknown structures. Developing novel semiparametric treatments and deriving a class of consistent and efficient estimators can not only provide us with better inferences, but also a general framework in those studies. In capture-recapture models for closed populations, the goal is to estimate the abundance of population. When multiple error-prone measurements of a covariate are available, we discover that no suitable complete and sufficient statistic exists due to the identity between the number of captures and the number of measurements. Hence the existing treatment utilizing such statistic no longer apply. Our investigation indicates that the familiar strategy of generalized method of moments can only resolve the issue with high capture probabilities. Further complexity includes the loss of the surrogacy assumption, commonly assumed in most measurement error problems. We devise a novel semiparametric treatment to overcome those difficulties. Simulation studies and real data analysis show good performance of our method. In HIV research, we study errors-in-variables problems when the response is bi- nary and instrumental variables are available. We construct consistent estimators through taking advantage of the prediction relation between the unobservable variables and the instruments. The asymptotic properties of the new estimator are established, and illustrated through simulation studies. We also demonstrate that the method can be readily generalized to generalized linear models and beyond. The usefulness of the method is illustrated through a real data example. Lastly, we nonparametrically estimate distribution functions for multiple populations in kin-cohort studies. The data is mixed and known to belong to a specific population with certain probabilities. Some of the observations can be further correlated, and are subject to censoring. We estimate the distributions in an optimal way through using the optimal base estimators and then combine the estimators optimally as well. The optimality implies both estimation consistency and minimum estimation variability. One obvious advantage is that our estimator does not assume any parametric forms of the distributions, and does not require to know or to model the potential correlation structure. Analysis on the Huntingtonโ€™s disease data is performed to illustrate the effectiveness of the method

    True Spatio-Temporal Detection and Estimation for Functional Magnetic Resonance Imaging.

    Full text link
    The development of fast imaging in magnetic resonance imaging (MRI) makes it possible for researchers in various fields to investigate functional activities of the human brain with a unique combination of high spatial and temporal resolution. A significant task in functional MRI data analysis is to develop a detection statistic for activation, showing subjectโ€™s localized brain responses to pre-specified stimuli. With rare exceptions in FMRI, these detection statistics have been derived from a measurement model under two main assumptions: spatial independence and space-time separability of background noise. One of the main goals of this thesis is to remove these assumptions which have been widely used in existing approaches. This thesis makes three main contributions:(1) a development of a detection statistic based on a spatiotemporally correlated noise model without space-time separability, (2) signal and noise modeling to implement the proposed detection statistic, (3) a development of a detection statistic that is robust to signal-to-noise ratio (SNR), Rician activation detection. For the first time in FMRI, we develop a properly formulated spatiotemporal detection statistic for activation, based on a spatiotemporally correlated noise model without space-time separability. The implementation of the developed detection statistic requires joint signal and noise modeling in three or four dimensions, which is non-trivial statistical model estimation. We complete the implementation with the parametric cepstrum, allowing dramatic reduction of computations in model fitting. These two are totally new contributions to FMRI data analysis. As byproducts, a novel test procedure for space-time separability is proposed and its asymptotic power is analyzed. The developed detection statistic and conventional statistics involving spatial smoothing by Gaussian kernel are compared through a model comparison technique and asymptotic relative efficiency. Most methods in FMRI data analysis are based on magnitude voxel time courses and their approximation by a Gaussian distribution. Since the magnitude images, in fact, obey Rician distribution and the Gaussian approximation is valid under a high SNR assumption, Gaussian modeling may perform poorly when SNR is low. In this thesis, we develop a detection statistic from a Rician distributed model, allowing a robust activation detection regardless of SNR.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57634/2/nohjoonk_1.pd

    Temporally Varying Weight Regression for Speech Recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    • โ€ฆ
    corecore