257 research outputs found

    Discriminative training for continuous speech recognition

    Get PDF
    Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is used for the HMM parameter update for both objective functions. The discriminative training methods were utilized in speaker independent phoneme recognition experiments and improved the phoneme recognition rates for both discriminative training techniques

    A hybrid RBF-HMM system for continuous speech recognition

    Get PDF
    A hybrid system for continuous speech recognition, consisting of a neural network with Radial Basis Functions and Hidden Markov Models is described in this paper together with discriminant training techniques. Initially the neural net is trained to approximate a-posteriori probabilities of single HMM states. These probabilities are used by the Viterbi algorithm to calculate the total scores for the individual hybrid phoneme models. The final training of the hybrid system is based on the "Minimum Classification Error\u27; objective function, which approximates the misclassification rate of the hybrid classifier, and the "Generalized Probabilistic Descent\u27; algorithm. The hybrid system was used in continuous speech recognition experiments with phoneme units and shows about 63.8% phoneme recognition rate in a speaker-independent task

    A new model-discriminant training algorithm for hybrid NN-HMM systems

    Get PDF
    This paper describes a hybrid system for continuous speech recognition consisting of a neural network (NN) and a hidden Markov model (HMM). The system is based on a multilayer perceptron, which approximates the a-posteriori probability of a sequence of states, derived from semi-continuous hidden Markov models. The classification is based on a total score for each hybrid model, attained from a Viterbi search on the state probabilities. Due to the unintended discrimination between the states in each model, a new training algorithm for the hybrid neural networks is presented. The utilized error function approximates the misclassification rate of the hybrid system. The discriminance between the correct and the incorrect models is optimized during the training by the "Generalized Probabilistic Descent Algorithm\u27;, resulting in a minimum classification error. No explicit target values for the neural net output nodes are used, as in the usual backpropagation algorithm with a quadratic error function. In basic experiments up to 56% recognition rate were achieved on a vowel classification task and up to 69 % on a consonant cluster classification task

    Neural networks for nonlinear discriminant analysis in continuous speech recognition

    Get PDF
    In this paper neural networks for Nonlinear Discriminant Analysis in continuous speech recognition are presented. Multilayer Perceptrons are used to estimate a-posteriori probabilities for Hidden-Markov Model states, which are the optimal discriminant features for the separation of the HMM states. The a-posteriori probabilities are transformed by a principal component analysis to calculate the new features for semicontinuous HMMs, which are trained by the known Maximum-Likelihood training. The nonlinear discriminant transformation is used in speaker-independent phoneme recognition experiments and compared to the standard Linear Discriminant Analysis technique

    Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer

    Get PDF
    Atmos. Meas. Tech., 10, 695-708, 2017 http://www.atmos-meas-tech.net/10/695/2017/ doi:10.5194/amt-10-695-2017 © Author(s) 2017. This work is distributed under the Creative Commons Attribution 3.0 License.Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen. This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification. For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks). The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol. Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets. A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results. We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.Peer reviewe

    Evaluation of Machine Learning Algorithms for Classification of Primary Biological Aerosol using a new UV-LIF spectrometer

    Get PDF
    © Author(s) 2016. This work is distributed under the Creative Commons Attribution 3.0 License.Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen. This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non- biological fluorescent interferents bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification. For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and Ad-aBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian naïve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm. The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. Clustering, in general performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7 and 91.1 percent for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets.Peer reviewe

    Machine learning for improved data analysis of biological aerosol using the WIBS

    Get PDF
    Abstract. Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will provide different responses in the presence of ultraviolet light which potentially could be used to discriminate between different types of biological aerosol. Development of ultraviolet light induced fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) has made is possible to collect size, morphology and fluorescence measurements in real-time. However, it is unclear without studying responses from the instrument in the laboratory, the extent to which we can discriminate between different types of particles. Collection of laboratory data is vital to validate any approach used to analyse the data and to ensure that the data available is utilised as effectively as possible. In this manuscript we test a variety of methodologies on traditional reference particles and a range of laboratory generated aerosols. Hierarchical Agglomerative Clustering (HAC) has been previously applied to UV-LIF data in a number of studies and is tested alongside other algorithms that could be used to solve the classification problem: Density Based Spectral Clustering and Noise (DBSCAN), k-means and gradient boosting. Whilst HAC was able to effectively discriminate between the reference particles, yielding a classification error of only 1.8 %, similar results were not obtained when testing on laboratory generated aerosol where the classification error was found to be between 11.5 % and 24.2 %. Furthermore, there is a worryingly large uncertainty in this approach in terms of the data preparation and the cluster index used, and we were unable attain consistent results across the different sets of laboratory generated aerosol tested. The best results were obtained using gradient boosting, where the misclassification rate was between 4.38 % and 5.42 %. The largest contribution to this error was the pollen samples where 28.5 % of the samples were misclassified as fungal spores. The technique was also robust to changes in data preparation provided a fluorescent threshold was applied to the data. Where laboratory training data is unavailable, DBSCAN was found to be an potential alternative to HAC. In the case of one of the data sets where 22.9 % of the data was left unclassified we were able to produce three distinct clusters obtaining a classification error of only 1.42 % on the classified data. These results could not be replicated however for the other data set where 26.8 % of the data was not classified and a classification error of 13.8 % was obtained. This method, like HAC, also appeared to be heavily dependent on data preparation, requiring different selection of parameters dependent on the preparation used. Further analysis will also be required to confirm our selection of parameters when using this method on ambient data. There is a clear need for the collection of additional laboratory generated aerosol to improve interpretation of current databases and to aid in the analysis of data collected from an ambient environment. New instruments with a greater resolution are likely improve on current discrimination between pollen, bacteria and fungal spores and even between their different types, however the need for extensive laboratory training data sets will grow as a result. </jats:p

    Chemical interaction at the buried silicon/zinc oxide thin-film solar cell interface as revealed by hard x-ray photoelectron spectroscopy

    Get PDF
    Hard X-ray photoelectron spectroscopy (HAXPES) is used to identify chemical interactions (such as elemental redistribution) at the buried silicon /aluminum-doped zinc oxide thin-film solar cell interface. Expanding our study of the interfacial oxidation of silicon upon its solid-phase crystallization (SPC), in which we found zinc oxide to be the source of oxygen, in this investigation we address chemical interaction processes involving zinc and aluminum. In particular, we observe an increase of zinc- and aluminum-related HAXPES signals after SPC of the deposited amorphous silicon thin films. Quantitative analysis suggests an elemental redistribution in the proximity of the silicon/aluminum-doped zinc oxide interface – more pronounced for aluminum than for zinc – as explanation. Based on these insights the complex chemical interface structure is discussed

    Measurement of inclusive D*+- and associated dijet cross sections in photoproduction at HERA

    Get PDF
    Inclusive photoproduction of D*+- mesons has been measured for photon-proton centre-of-mass energies in the range 130 < W < 280 GeV and a photon virtuality Q^2 < 1 GeV^2. The data sample used corresponds to an integrated luminosity of 37 pb^-1. Total and differential cross sections as functions of the D* transverse momentum and pseudorapidity are presented in restricted kinematical regions and the data are compared with next-to-leading order (NLO) perturbative QCD calculations using the "massive charm" and "massless charm" schemes. The measured cross sections are generally above the NLO calculations, in particular in the forward (proton) direction. The large data sample also allows the study of dijet production associated with charm. A significant resolved as well as a direct photon component contribute to the cross section. Leading order QCD Monte Carlo calculations indicate that the resolved contribution arises from a significant charm component in the photon. A massive charm NLO parton level calculation yields lower cross sections compared to the measured results in a kinematic region where the resolved photon contribution is significant.Comment: 32 pages including 6 figure

    Measurement of Jet Shapes in Photoproduction at HERA

    Full text link
    The shape of jets produced in quasi-real photon-proton collisions at centre-of-mass energies in the range 134277134-277 GeV has been measured using the hadronic energy flow. The measurement was done with the ZEUS detector at HERA. Jets are identified using a cone algorithm in the ηϕ\eta - \phi plane with a cone radius of one unit. Measured jet shapes both in inclusive jet and dijet production with transverse energies ETjet>14E^{jet}_T>14 GeV are presented. The jet shape broadens as the jet pseudorapidity (ηjet\eta^{jet}) increases and narrows as ETjetE^{jet}_T increases. In dijet photoproduction, the jet shapes have been measured separately for samples dominated by resolved and by direct processes. Leading-logarithm parton-shower Monte Carlo calculations of resolved and direct processes describe well the measured jet shapes except for the inclusive production of jets with high ηjet\eta^{jet} and low ETjetE^{jet}_T. The observed broadening of the jet shape as ηjet\eta^{jet} increases is consistent with the predicted increase in the fraction of final state gluon jets.Comment: 29 pages including 9 figure
    corecore