19 research outputs found

    Snapshot hyperspectral imaging using wide dilation networks

    Get PDF
    Hyperspectral (HS) cameras record the spectrum at multiple wavelengths for each pixel in an image, and are used, e.g., for quality control and agricultural remote sensing. We introduce a fast, cost-efficient and mobile method of taking HS images using a regular digital camera equipped with a passive diffraction grating filter, using machine learning for constructing the HS image. The grating distorts the image by effectively mapping the spectral information into spatial dislocations, which we convert into a HS image by a convolutional neural network utilizing novel wide dilation convolutions that accurately model optical properties of diffraction. We demonstrate high-quality HS reconstruction using a model trained on only 271 pairs of diffraction grating and ground truth HS images.Peer reviewe

    Computation of the one-dimensional unwrapped phase

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 101-102). "Cepstrum bibliography" (p. 67-100).In this thesis, the computation of the unwrapped phase of the discrete-time Fourier transform (DTFT) of a one-dimensional finite-length signal is explored. The phase of the DTFT is not unique, and may contain integer multiple of 27r discontinuities. The unwrapped phase is the instance of the phase function chosen to ensure continuity. This thesis presents existing algorithms for computing the unwrapped phase, discussing their weaknesses and strengths. Then two composite algorithms are proposed that use the existing ones, combining their strengths while avoiding their weaknesses. The core of the proposed methods is based on recent advances in polynomial factoring. The proposed methods are implemented and compared to the existing ones.by Zahi Nadim Karam.S.M

    Maximum Likelihood Pitch Estimation Using Sinusoidal Modeling

    Get PDF
    The aim of the work presented in this thesis is to automatically extract the fundamental frequency of a periodic signal from noisy observations, a task commonly referred to as pitch estimation. An algorithm for optimal pitch estimation using a maximum likelihood formulation is presented. The speech waveform is modeled using sinusoidal basis functions that are harmonically tied together to explicitly capture the periodic structure of voiced speech. The problem of pitch estimation is casted as a model selection problem and the Akaike Information Criterion is used to estimate the pitch. The algorithm is compared with several existing pitch detection algorithms (PDAs) on a reference pitch database. The results indicate the superior performance of the algorithm in comparison with most of the PDAs. The application of parametric modeling in single channel speech segregation and the use of mel-frequency cepstral coefficients for sequential grouping are analyzed in the speech separation challenge database

    Precise Estimation of Vocal Tract and Voice Source Characteristics

    Get PDF
    This thesis addresses the problem of quality degradation in speech produced by parameter-based speech synthesis, within the framework of an articulatory-acoustic forward mapping. I first investigate current problems in speech parameterisation, and point out the fact that conventional parameterisation inaccurately extracts the vocal tract response due to interference from the harmonic structure of voiced speech. To overcome this problem, I introduce a method for estimating filter responses more precisely from periodic signals. The method achieves such estimation in the frequency domain by approximating all the harmonics observed in several frames based on a least squares criterion. It is shown that the proposed method is capable of estimating the response more accurately than widely-used frame-by-frame parameterisation, for simulations using synthetic speech and for an articulatory-acoustic mapping using actual speech. I also deal with the source-filter separation problem and independent control of the voice source characteristic during speech synthesis. I propose a statistical approach to separating out the vocal-tract filter response from the voice source characteristic using a large articulatory database. The approach realises such separation for voiced speech using an iterative approximation procedure under the assumption that the speech production process is a linear system composed of a voice source and a vocal-tract filter, and that each of the components is controlled independently by different sets of factors. Experimental results show that controlling the source characteristic greatly improves the accuracy of the articulatory-acoustic mapping, and that the spectral variation of the source characteristic is evidently influenced by the fundamental frequency or the power of speech. The thesis provides more accurate acoustical approximation of the vocal tract response, which will be beneficial in a wide range of speech technologies, and lays the groundwork in speech science for a new type of corpus-based statistical solution to the source-filter separation problem

    Phase Space Analysis and Classification of Sonar Echoes in Shallow-Water Channels

    Get PDF
    A primary objective of active sonar systems is to detect, locate, and classify objects, such as mines, ships, and biologics, based on their sonar backscatter. A shallow-water ocean channel is a challenging environment in which to classify sonar echoes because interactions of the sonar signal with the ocean surface and bottom induce frequency-dependent changes (especially dispersion and damping) in the signal as it propagates, the effects of which typically grow with range. Accordingly, the observed signal depends not only on the initial target backscatter, but also the propagation channel and how far the signal has propagated. These propagation effects can increase the variability of observed target echoes and degrade classification performance. Furthermore, uncertainty of the exact propagation channel and random variations within a channel cause classification features extracted from the received sonar echo to behave as random variables.With the goal of improving sonar signal classification in shallow-water environments, this work develops a phase space framework for studying sound propagation in channels with dispersion and damping. This approach leads to new moment features for classification that are invariant to dispersion and damping, the utility of which is demonstrated via simulation. In addition, the accuracy of a previously developed phase space approximation method for range-independent pulse propagation is analyzed and shown to be greater than the accuracy of the standard stationary phase approximation for both large and small times/distances. The phase space approximation is also extended to range dependent propagation. Finally, the phase space approximation is used to investigate the random nature of moment features for classification by calculating the moments of the moment features under uncertain and random channel assumptions. These moments of the moment features are used to estimate probability distribution functions for the moment features, and we explore several ways in which this information may be used to improve sonar classification performance

    Exploiting Robust Multivariate Statistics and Data Driven Techniques for Prognosis and Health Management

    Get PDF
    This thesis explores state of the art robust multivariate statistical methods and data driven techniques to holistically perform prognostics and health management (PHM). This provides a means to enable the early detection, diagnosis and prognosis of future asset failures. In this thesis, the developed PHM methodology is applied to wind turbine drive train components, specifically focussed on planetary gearbox bearings and gears. A novel methodology for the identification of relevant time-domain statistical features based upon robust statistical process control charts is presented for high frequency bearing accelerometer data. In total, 28 time-domain statistical features were evaluated for their capabilities as leading indicators of degradation. The results of this analysis describe the extensible multivariate “Moments’ model” for the encapsulation of bearing operational behaviour. This is presented, enabling the early degradation of detection, predictive diagnostics and estimation of remaining useful life (RUL). Following this, an extended physics of failure model based upon low frequency SCADA data for the quantification of wind turbine gearbox condition is described. This extends the state of the art, whilst defining robust performance charts for quantifying component condition. Normalisation against loading of the turbine and transient states based upon empirical data is performed in the bivariate domain, with extensibility into the multivariate domain if necessary. Prognosis of asset condition is found to be possible with the assistance of artificial neural networks in order to provide business intelligence to the planning and scheduling of effective maintenance actions. These multivariate condition models are explored with multivariate distance and similarity metrics for to exploit traditional data mining techniques for tacit knowledge extraction, ensemble diagnosis and prognosis. Estimation of bearing remaining useful life is found to be possible, with the derived technique correlating strongly to bearing life (r = .96

    On MMSE Estimation: A Linear Model Under Gaussian Mixture Statistics

    Full text link

    Automatic Speech Emotion Recognition- Feature Space Dimensionality and Classification Challenges

    Get PDF
    In the last decade, research in Speech Emotion Recognition (SER) has become a major endeavour in Human Computer Interaction (HCI), and speech processing. Accurate SER is essential for many applications, like assessing customer satisfaction with quality of services, and detecting/assessing emotional state of children in care. The large number of studies published on SER reflects the demand for its use. The main concern of this thesis is the investigation of SER from a pattern recognition and machine learning points of view. In particular, we aim to identify appropriate mathematical models of SER and examine the process of designing automatic emotion recognition schemes. There are major challenges to automatic SER including ambiguity about the list/definition of emotions, the lack of agreement on a manageable set of uncorrelated speech-based emotion relevant features, and the difficulty of collected emotion-related datasets under natural circumstances. We shall initiate our work by dealing with the identification of appropriate sets of emotion related features/attributes extractible from speech signals as considered from psychological and computational points of views. We shall investigate the use of pattern-recognition approaches to remove redundancies and achieve compactification of digital representation of the extracted data with minimal loss of information. The thesis will include the design of new or complement existing SER schemes and conduct large sets of experiments to empirically test their performances on different databases, identify advantages, and shortcomings of using speech alone for emotion recognition. Existing SER studies seem to deal with the ambiguity/dis-agreement on a “limited” number of emotion-related features by expanding the list from the same speech signal source/sites and apply various feature selection procedures as a mean of reducing redundancies. Attempts are made to discover more relevant features to emotion from speech. One of our investigations focuses on proposing a newly sets of features for SER, extracted from Linear Predictive (LP)-residual speech. We shall demonstrate the usefulness of the proposed relatively small set of features by testing the performance of an SER scheme that is based on fusing our set of features with the existing set of thousands of features using common machine learning schemes of Support Vector Machine (SVM) and Artificial Neural Network (ANN). The challenge of growing dimensionality of SER feature space and its impact on increased model complexity is another major focus of our research project. By studying the pros and cons of the commonly used feature selection approaches, we argued in favour of meta-feature selection and developed various methods in this direction, not only to reduce dimension, but also to adapt and de-correlate emotional feature spaces for improved SER model recognition accuracy. We used rincipal Component Analysis (PCA) and proposed Data Independent PCA (DIPCA) by training on independent emotional and non-emotional datasets. The DIPCA projections, especially when extracted from speech data coloured with different emotions or from Neutral speech data, had comparable capability to the PCA in terms of SER performance. Another adopted approach in this thesis for dimension reduction is the Random Projection (RP) matrices, independent of training data. We have shown that some versions of RP with SVM classifier can offer an adaptation space for Speaker Independent SER that avoid over-fitting and hence improves recognition accuracy. Using PCA trained on a set of data, while testing on emotional data features, has significant implication for machine learning in general. The thesis other major contribution focuses on the classification aspects of SER. We investigate the drawbacks of the well-known SVM classifier when applied to a preprocessed data by PCA and RP. We shall demonstrate the advantages of using the Linear Discriminant Classifier (LDC) instead especially for PCA de-correlated metafeatures. We initiated a variety of LDC-based ensembles classification, to test performance of scheme using a new form of bagging different subsets of metafeature subsets extracted by PCA with encouraging results. The experiments conducted were applied on two benchmark datasets (Emo-Berlin and FAU-Aibo), and an in-house dataset in the Kurdish language. Recognition accuracy achieved by are significantly higher than the state of art results on all datasets. The results, however, revealed a difficult challenge in the form of persisting wide gap in accuracy over different datasets, which cannot be explained entirely by the differences between the natures of the datasets. We conducted various pilot studies that were based on various visualizations of the confusion matrices for the “difficult” databases to build multi-level SER schemes. These studies provide initial evidences to the presence of more than one “emotion” in the same portion of speech. A possible solution may be through presenting recognition accuracy in a score-based measurement like the spider chart. Such an approach may also reveal the presence of Doddington zoo phenomena in SER

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    NA

    Get PDF
    http://archive.org/details/geometricdesignt00zumrNAN
    corecore