42 research outputs found
Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations
We generalize the well-known mixtures of Gaussians approach to density
estimation and the accompanying Expectation--Maximization technique for finding
the maximum likelihood parameters of the mixture to the case where each data
point carries an individual -dimensional uncertainty covariance and has
unique missing data properties. This algorithm reconstructs the
error-deconvolved or "underlying" distribution function common to all samples,
even when the individual data points are samples from different distributions,
obtained by convolving the underlying distribution with the heteroskedastic
uncertainty distribution of the data point and projecting out the missing data
directions. We show how this basic algorithm can be extended with conjugate
priors on all of the model parameters and a "split-and-merge" procedure
designed to avoid local maxima of the likelihood. We demonstrate the full
method by applying it to the problem of inferring the three-dimensional
velocity distribution of stars near the Sun from noisy two-dimensional,
transverse velocity measurements from the Hipparcos satellite.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS439 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Cleaning the USNO-B Catalog through automatic detection of optical artifacts
The USNO-B Catalog contains spurious entries that are caused by diffraction
spikes and circular reflection halos around bright stars in the original
imaging data. These spurious entries appear in the Catalog as if they were real
stars; they are confusing for some scientific tasks. The spurious entries can
be identified by simple computer vision techniques because they produce
repeatable patterns on the sky. Some techniques employed here are variants of
the Hough transform, one of which is sensitive to (two-dimensional)
overdensities of faint stars in thin right-angle cross patterns centered on
bright (<13 \mag) stars, and one of which is sensitive to thin annular
overdensities centered on very bright (<7 \mag) stars. After enforcing
conservative statistical requirements on spurious-entry identifications, we
find that of the 1,042,618,261 entries in the USNO-B Catalog, 24,148,382 of
them (2.3 \percent) are identified as spurious by diffraction-spike criteria
and 196,133 (0.02 \percent) are identified as spurious by reflection-halo
criteria. The spurious entries are often detected in more than 2 bands and are
not overwhelmingly outliers in any photometric properties; they therefore
cannot be rejected easily on other grounds, i.e., without the use of computer
vision techniques. We demonstrate our method, and return to the community in
electronic form a table of spurious entries in the Catalog.Comment: published in A
The velocity distribution of nearby stars from Hipparcos data I. The significance of the moving groups
We present a three-dimensional reconstruction of the velocity distribution of
nearby stars (<~ 100 pc) using a maximum likelihood density estimation
technique applied to the two-dimensional tangential velocities of stars. The
underlying distribution is modeled as a mixture of Gaussian components. The
algorithm reconstructs the error-deconvolved distribution function, even when
the individual stars have unique error and missing-data properties. We apply
this technique to the tangential velocity measurements from a kinematically
unbiased sample of 11,865 main sequence stars observed by the Hipparcos
satellite. We explore various methods for validating the complexity of the
resulting velocity distribution function, including criteria based on Bayesian
model selection and how accurately our reconstruction predicts the radial
velocities of a sample of stars from the Geneva-Copenhagen survey (GCS). Using
this very conservative external validation test based on the GCS, we find that
there is little evidence for structure in the distribution function beyond the
moving groups established prior to the Hipparcos mission. This is in sharp
contrast with internal tests performed here and in previous analyses, which
point consistently to maximal structure in the velocity distribution. We
quantify the information content of the radial velocity measurements and find
that the mean amount of new information gained from a radial velocity
measurement of a single star is significant. This argues for complementary
radial velocity surveys to upcoming astrometric surveys
Urinary MicroRNA Profiling in the Nephropathy of Type 1 Diabetes
Background: Patients with Type 1 Diabetes (T1D) are particularly vulnerable to development of Diabetic nephropathy (DN) leading to End Stage Renal Disease. Hence a better understanding of the factors affecting kidney disease progression in T1D is urgently needed. In recent years microRNAs have emerged as important post-transcriptional regulators of gene expression in many different health conditions. We hypothesized that urinary microRNA profile of patients will differ in the different stages of diabetic renal disease. Methods and Findings: We studied urine microRNA profiles with qPCR in 40 T1D with >20 year follow up 10 who never developed renal disease (N) matched against 10 patients who went on to develop overt nephropathy (DN), 10 patients with intermittent microalbuminuria (IMA) matched against 10 patients with persistent (PMA) microalbuminuria. A Bayesian procedure was used to normalize and convert raw signals to expression ratios. We applied formal statistical techniques to translate fold changes to profiles of microRNA targets which were then used to make inferences about biological pathways in the Gene Ontology and REACTOME structured vocabularies. A total of 27 microRNAs were found to be present at significantly different levels in different stages of untreated nephropathy. These microRNAs mapped to overlapping pathways pertaining to growth factor signaling and renal fibrosis known to be targeted in diabetic kidney disease. Conclusions: Urinary microRNA profiles differ across the different stages of diabetic nephropathy. Previous work using experimental, clinical chemistry or biopsy samples has demonstrated differential expression of many of these microRNAs in a variety of chronic renal conditions and diabetes. Combining expression ratios of microRNAs with formal inferences about their predicted mRNA targets and associated biological pathways may yield useful markers for early diagnosis and risk stratification of DN in T1D by inferring the alteration of renal molecular processes. © 2013 Argyropoulos et al
Data driven production models for speech processing
When difficult computations are to be performed on sensory data it is often advantageous to employ a model of the underlying process which produced the observations. Because such generative models capture information about the set of possible observations, they can help to explain complex variability naturally present in the data and are useful in separating signal from noise. In the case of neural and artificial sensory processing systems generative models are learned directly from environmental input although they are often rooted in the underlying physics of the modality involved. One effective use of learned models is made by performing model inversion or state inference on incoming observation sequences to discover the underlying state or control parameter trajectories which could have produced them. These inferred states can then be used as inputs to a pattern recognition or pattern completion module.
In the case of human speech perception and production, the models in question are called articulatory models and relate the movements of a talker's mouth to the sequence of sounds produced. Linguistic theories and substantial psychophysical evidence argue strongly that articulatory model inversion plays an important role in speech perception and recognition in the brain. Unfortunately, despite potential engineering advantages and evidence for being part of the human strategy, such inversion of speech production models is absent in almost all artificial speech processing systems.
This dissertation presents a series of experiments which investigate articulatory speech processing using real speech production data from a database containing simultaneous audio and mouth movement recordings. I show that it is possible to learn simple low dimensionality models which accurately capture the structure observed in such real production data. I discuss how these models can be used to learn a forward synthesis system which generates spectral sequences from articulatory movements. I also describe an inversion algorithm which estimates movements from an acoustic signal Finally, I demonstrate the use of articulatory movements, both true and recovered, in a simple speech recognition task, showing the possibility of doing true articulatory speech recognition in artificial systems
Chapter 8 AUTOMATIC SPEECH PROCESSING BY INFERENCE IN GENERATIVE MODELS
Abstract Normally, algorithms which process speech signals to estimate quantities of interest (e.g. pitch) or perform various complex operations (e.g. denoising) are designed directly, by experts, to computing the final output from the input representation using a series of processing steps. Another approach is to build a probabilistic generative model of the input (waveform or short time spectra) in which the quantities of eventual interest are represented as hidden or latent variables. Estimation then takes the form of statistical inference in these models, for which well known algorithms exist. The model parameters themselves can be learned from example inputs. Often, the results of such inference can be extremely informative even when the trained model does not capture all of the complexity in the original input data. In this chapter, we will give several examples of this paradigm, showing how inference in very simple generative models can be used to perform surprisingly complex speech processing tasks including denoising, source separation, pitch tracking, timescale modification and estimation of articulatory movements from audio