257 research outputs found
Discriminative training for continuous speech recognition
Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is used for the HMM parameter update for both objective functions. The discriminative training methods were utilized in speaker independent phoneme recognition experiments and improved the phoneme recognition rates for both discriminative training techniques
A hybrid RBF-HMM system for continuous speech recognition
A hybrid system for continuous speech recognition, consisting of a neural network with Radial Basis Functions and Hidden Markov Models is described in this paper together with discriminant training techniques. Initially the neural net is trained to approximate a-posteriori probabilities of single HMM states. These probabilities are used by the Viterbi algorithm to calculate the total scores for the individual hybrid phoneme models. The final training of the hybrid system is based on the "Minimum Classification Error\u27; objective function, which approximates the misclassification rate of the hybrid classifier, and the "Generalized Probabilistic Descent\u27; algorithm. The hybrid system was used in continuous speech recognition experiments with phoneme units and shows about 63.8% phoneme recognition rate in a speaker-independent task
A new model-discriminant training algorithm for hybrid NN-HMM systems
This paper describes a hybrid system for continuous speech recognition consisting of a neural network (NN) and a hidden Markov model (HMM). The system is based on a multilayer perceptron, which approximates the a-posteriori probability of a sequence of states, derived from semi-continuous hidden Markov models. The classification is based on a total score for each hybrid model, attained from a Viterbi search on the state probabilities. Due to the unintended discrimination between the states in each model, a new training algorithm for the hybrid neural networks is presented. The utilized error function approximates the misclassification rate of the hybrid system. The discriminance between the correct and the incorrect models is optimized during the training by the "Generalized Probabilistic Descent Algorithm\u27;, resulting in a minimum classification error. No explicit target values for the neural net output nodes are used, as in the usual backpropagation algorithm with a quadratic error function. In basic experiments up to 56% recognition rate were achieved on a vowel classification task and up to 69 % on a consonant cluster classification task
Neural networks for nonlinear discriminant analysis in continuous speech recognition
In this paper neural networks for Nonlinear Discriminant Analysis in continuous speech recognition are presented. Multilayer Perceptrons are used to estimate a-posteriori probabilities for Hidden-Markov Model states, which are the optimal discriminant features for the separation of the HMM states. The a-posteriori probabilities are transformed by a principal component analysis to calculate the new features for semicontinuous HMMs, which are trained by the known Maximum-Likelihood training. The nonlinear discriminant transformation is used in speaker-independent phoneme recognition experiments and compared to the standard Linear Discriminant Analysis technique
Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer
Atmos. Meas. Tech., 10, 695-708, 2017 http://www.atmos-meas-tech.net/10/695/2017/ doi:10.5194/amt-10-695-2017 © Author(s) 2017. This work is distributed under the Creative Commons Attribution 3.0 License.Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen. This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification. For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks). The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol. Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets. A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results. We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.Peer reviewe
Evaluation of Machine Learning Algorithms for Classification of Primary Biological Aerosol using a new UV-LIF spectrometer
© Author(s) 2016. This work is distributed under the Creative Commons Attribution 3.0 License.Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen. This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non- biological fluorescent interferents bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification. For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and Ad-aBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian naïve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm. The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. Clustering, in general performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7 and 91.1 percent for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets.Peer reviewe
Machine learning for improved data analysis of biological aerosol using the WIBS
Abstract. Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will provide different responses in the presence of ultraviolet light which potentially could be used to discriminate between different types of biological aerosol. Development of ultraviolet light induced fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) has made is possible to collect size, morphology and fluorescence measurements in real-time. However, it is unclear without studying responses from the instrument in the laboratory, the extent to which we can discriminate between different types of particles. Collection of laboratory data is vital to validate any approach used to analyse the data and to ensure that the data available is utilised as effectively as possible. In this manuscript we test a variety of methodologies on traditional reference particles and a range of laboratory generated aerosols. Hierarchical Agglomerative Clustering (HAC) has been previously applied to UV-LIF data in a number of studies and is tested alongside other algorithms that could be used to solve the classification problem: Density Based Spectral Clustering and Noise (DBSCAN), k-means and gradient boosting. Whilst HAC was able to effectively discriminate between the reference particles, yielding a classification error of only 1.8 %, similar results were not obtained when testing on laboratory generated aerosol where the classification error was found to be between 11.5 % and 24.2 %. Furthermore, there is a worryingly large uncertainty in this approach in terms of the data preparation and the cluster index used, and we were unable attain consistent results across the different sets of laboratory generated aerosol tested. The best results were obtained using gradient boosting, where the misclassification rate was between 4.38 % and 5.42 %. The largest contribution to this error was the pollen samples where 28.5 % of the samples were misclassified as fungal spores. The technique was also robust to changes in data preparation provided a fluorescent threshold was applied to the data. Where laboratory training data is unavailable, DBSCAN was found to be an potential alternative to HAC. In the case of one of the data sets where 22.9 % of the data was left unclassified we were able to produce three distinct clusters obtaining a classification error of only 1.42 % on the classified data. These results could not be replicated however for the other data set where 26.8 % of the data was not classified and a classification error of 13.8 % was obtained. This method, like HAC, also appeared to be heavily dependent on data preparation, requiring different selection of parameters dependent on the preparation used. Further analysis will also be required to confirm our selection of parameters when using this method on ambient data. There is a clear need for the collection of additional laboratory generated aerosol to improve interpretation of current databases and to aid in the analysis of data collected from an ambient environment. New instruments with a greater resolution are likely improve on current discrimination between pollen, bacteria and fungal spores and even between their different types, however the need for extensive laboratory training data sets will grow as a result. </jats:p
Chemical interaction at the buried silicon/zinc oxide thin-film solar cell interface as revealed by hard x-ray photoelectron spectroscopy
Hard X-ray photoelectron spectroscopy (HAXPES) is used to identify chemical
interactions (such as elemental redistribution) at the buried silicon
/aluminum-doped zinc oxide thin-film solar cell interface. Expanding our study
of the interfacial oxidation of silicon upon its solid-phase crystallization
(SPC), in which we found zinc oxide to be the source of oxygen, in this
investigation we address chemical interaction processes involving zinc and
aluminum. In particular, we observe an increase of zinc- and aluminum-related
HAXPES signals after SPC of the deposited amorphous silicon thin films.
Quantitative analysis suggests an elemental redistribution in the proximity of
the silicon/aluminum-doped zinc oxide interface – more pronounced for aluminum
than for zinc – as explanation. Based on these insights the complex chemical
interface structure is discussed
Measurement of inclusive D*+- and associated dijet cross sections in photoproduction at HERA
Inclusive photoproduction of D*+- mesons has been measured for photon-proton
centre-of-mass energies in the range 130 < W < 280 GeV and a photon virtuality
Q^2 < 1 GeV^2. The data sample used corresponds to an integrated luminosity of
37 pb^-1. Total and differential cross sections as functions of the D*
transverse momentum and pseudorapidity are presented in restricted kinematical
regions and the data are compared with next-to-leading order (NLO) perturbative
QCD calculations using the "massive charm" and "massless charm" schemes. The
measured cross sections are generally above the NLO calculations, in particular
in the forward (proton) direction. The large data sample also allows the study
of dijet production associated with charm. A significant resolved as well as a
direct photon component contribute to the cross section. Leading order QCD
Monte Carlo calculations indicate that the resolved contribution arises from a
significant charm component in the photon. A massive charm NLO parton level
calculation yields lower cross sections compared to the measured results in a
kinematic region where the resolved photon contribution is significant.Comment: 32 pages including 6 figure
Measurement of Jet Shapes in Photoproduction at HERA
The shape of jets produced in quasi-real photon-proton collisions at
centre-of-mass energies in the range GeV has been measured using the
hadronic energy flow. The measurement was done with the ZEUS detector at HERA.
Jets are identified using a cone algorithm in the plane with a
cone radius of one unit. Measured jet shapes both in inclusive jet and dijet
production with transverse energies GeV are presented. The jet
shape broadens as the jet pseudorapidity () increases and narrows
as increases. In dijet photoproduction, the jet shapes have been
measured separately for samples dominated by resolved and by direct processes.
Leading-logarithm parton-shower Monte Carlo calculations of resolved and direct
processes describe well the measured jet shapes except for the inclusive
production of jets with high and low . The observed
broadening of the jet shape as increases is consistent with the
predicted increase in the fraction of final state gluon jets.Comment: 29 pages including 9 figure
- …