42 research outputs found

    Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations

    Full text link
    We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation--Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual dd-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or "underlying" distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the heteroskedastic uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with conjugate priors on all of the model parameters and a "split-and-merge" procedure designed to avoid local maxima of the likelihood. We demonstrate the full method by applying it to the problem of inferring the three-dimensional velocity distribution of stars near the Sun from noisy two-dimensional, transverse velocity measurements from the Hipparcos satellite.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS439 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Cleaning the USNO-B Catalog through automatic detection of optical artifacts

    Full text link
    The USNO-B Catalog contains spurious entries that are caused by diffraction spikes and circular reflection halos around bright stars in the original imaging data. These spurious entries appear in the Catalog as if they were real stars; they are confusing for some scientific tasks. The spurious entries can be identified by simple computer vision techniques because they produce repeatable patterns on the sky. Some techniques employed here are variants of the Hough transform, one of which is sensitive to (two-dimensional) overdensities of faint stars in thin right-angle cross patterns centered on bright (<13 \mag) stars, and one of which is sensitive to thin annular overdensities centered on very bright (<7 \mag) stars. After enforcing conservative statistical requirements on spurious-entry identifications, we find that of the 1,042,618,261 entries in the USNO-B Catalog, 24,148,382 of them (2.3 \percent) are identified as spurious by diffraction-spike criteria and 196,133 (0.02 \percent) are identified as spurious by reflection-halo criteria. The spurious entries are often detected in more than 2 bands and are not overwhelmingly outliers in any photometric properties; they therefore cannot be rejected easily on other grounds, i.e., without the use of computer vision techniques. We demonstrate our method, and return to the community in electronic form a table of spurious entries in the Catalog.Comment: published in A

    The velocity distribution of nearby stars from Hipparcos data I. The significance of the moving groups

    Full text link
    We present a three-dimensional reconstruction of the velocity distribution of nearby stars (<~ 100 pc) using a maximum likelihood density estimation technique applied to the two-dimensional tangential velocities of stars. The underlying distribution is modeled as a mixture of Gaussian components. The algorithm reconstructs the error-deconvolved distribution function, even when the individual stars have unique error and missing-data properties. We apply this technique to the tangential velocity measurements from a kinematically unbiased sample of 11,865 main sequence stars observed by the Hipparcos satellite. We explore various methods for validating the complexity of the resulting velocity distribution function, including criteria based on Bayesian model selection and how accurately our reconstruction predicts the radial velocities of a sample of stars from the Geneva-Copenhagen survey (GCS). Using this very conservative external validation test based on the GCS, we find that there is little evidence for structure in the distribution function beyond the moving groups established prior to the Hipparcos mission. This is in sharp contrast with internal tests performed here and in previous analyses, which point consistently to maximal structure in the velocity distribution. We quantify the information content of the radial velocity measurements and find that the mean amount of new information gained from a radial velocity measurement of a single star is significant. This argues for complementary radial velocity surveys to upcoming astrometric surveys

    Urinary MicroRNA Profiling in the Nephropathy of Type 1 Diabetes

    Get PDF
    Background: Patients with Type 1 Diabetes (T1D) are particularly vulnerable to development of Diabetic nephropathy (DN) leading to End Stage Renal Disease. Hence a better understanding of the factors affecting kidney disease progression in T1D is urgently needed. In recent years microRNAs have emerged as important post-transcriptional regulators of gene expression in many different health conditions. We hypothesized that urinary microRNA profile of patients will differ in the different stages of diabetic renal disease. Methods and Findings: We studied urine microRNA profiles with qPCR in 40 T1D with >20 year follow up 10 who never developed renal disease (N) matched against 10 patients who went on to develop overt nephropathy (DN), 10 patients with intermittent microalbuminuria (IMA) matched against 10 patients with persistent (PMA) microalbuminuria. A Bayesian procedure was used to normalize and convert raw signals to expression ratios. We applied formal statistical techniques to translate fold changes to profiles of microRNA targets which were then used to make inferences about biological pathways in the Gene Ontology and REACTOME structured vocabularies. A total of 27 microRNAs were found to be present at significantly different levels in different stages of untreated nephropathy. These microRNAs mapped to overlapping pathways pertaining to growth factor signaling and renal fibrosis known to be targeted in diabetic kidney disease. Conclusions: Urinary microRNA profiles differ across the different stages of diabetic nephropathy. Previous work using experimental, clinical chemistry or biopsy samples has demonstrated differential expression of many of these microRNAs in a variety of chronic renal conditions and diabetes. Combining expression ratios of microRNAs with formal inferences about their predicted mRNA targets and associated biological pathways may yield useful markers for early diagnosis and risk stratification of DN in T1D by inferring the alteration of renal molecular processes. © 2013 Argyropoulos et al

    Data driven production models for speech processing

    Get PDF
    When difficult computations are to be performed on sensory data it is often advantageous to employ a model of the underlying process which produced the observations. Because such generative models capture information about the set of possible observations, they can help to explain complex variability naturally present in the data and are useful in separating signal from noise. In the case of neural and artificial sensory processing systems generative models are learned directly from environmental input although they are often rooted in the underlying physics of the modality involved. One effective use of learned models is made by performing model inversion or state inference on incoming observation sequences to discover the underlying state or control parameter trajectories which could have produced them. These inferred states can then be used as inputs to a pattern recognition or pattern completion module. In the case of human speech perception and production, the models in question are called articulatory models and relate the movements of a talker's mouth to the sequence of sounds produced. Linguistic theories and substantial psychophysical evidence argue strongly that articulatory model inversion plays an important role in speech perception and recognition in the brain. Unfortunately, despite potential engineering advantages and evidence for being part of the human strategy, such inversion of speech production models is absent in almost all artificial speech processing systems. This dissertation presents a series of experiments which investigate articulatory speech processing using real speech production data from a database containing simultaneous audio and mouth movement recordings. I show that it is possible to learn simple low dimensionality models which accurately capture the structure observed in such real production data. I discuss how these models can be used to learn a forward synthesis system which generates spectral sequences from articulatory movements. I also describe an inversion algorithm which estimates movements from an acoustic signal Finally, I demonstrate the use of articulatory movements, both true and recovered, in a simple speech recognition task, showing the possibility of doing true articulatory speech recognition in artificial systems

    Chapter 8 AUTOMATIC SPEECH PROCESSING BY INFERENCE IN GENERATIVE MODELS

    No full text
    Abstract Normally, algorithms which process speech signals to estimate quantities of interest (e.g. pitch) or perform various complex operations (e.g. denoising) are designed directly, by experts, to computing the final output from the input representation using a series of processing steps. Another approach is to build a probabilistic generative model of the input (waveform or short time spectra) in which the quantities of eventual interest are represented as hidden or latent variables. Estimation then takes the form of statistical inference in these models, for which well known algorithms exist. The model parameters themselves can be learned from example inputs. Often, the results of such inference can be extremely informative even when the trained model does not capture all of the complexity in the original input data. In this chapter, we will give several examples of this paradigm, showing how inference in very simple generative models can be used to perform surprisingly complex speech processing tasks including denoising, source separation, pitch tracking, timescale modification and estimation of articulatory movements from audio
    corecore