Search CORE

1,477 research outputs found

A Unifying review of linear gaussian models

Author: Ghahramani Zoubin
Roweis Sam
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/1999
Field of study

Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model.We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models

CiteSeerX

Caltech Authors

Astrometry.net: Blind astrometric calibration of arbitrary astronomical images

Author: Blanton Michael
Hogg David W.
Lang Dustin
Mierle Keir
Roweis Sam
Publication venue: 'IOP Publishing'
Publication date: 12/10/2009
Field of study

We have built a reliable and robust system that takes as input an astronomical image, and returns as output the pointing, scale, and orientation of that image (the astrometric calibration or WCS information). The system requires no first guess, and works with the information in the image pixels alone; that is, the problem is a generalization of the "lost in space" problem in which nothing--not even the image scale--is known. After robust source detection is performed in the input image, asterisms (sets of four or five stars) are geometrically hashed and compared to pre-indexed hashes to generate hypotheses about the astrometric calibration. A hypothesis is only accepted as true if it passes a Bayesian decision theory test against a background hypothesis. With indices built from the USNO-B Catalog and designed for uniformity of coverage and redundancy, the success rate is 99.9% for contemporary near-ultraviolet and visual imaging survey data, with no false positives. The failure rate is consistent with the incompleteness of the USNO-B Catalog; augmentation with indices built from the 2MASS Catalog brings the completeness to 100% with no false positives. We are using this system to generate consistent and standards-compliant meta-data for digital and digitized imaging from plate repositories, automated observatories, individual scientific investigators, and hobbyists. This is the first step in a program of making it possible to trust calibration meta-data for astronomical data of arbitrary provenance.Comment: submitted to A

arXiv.org e-Print Archive

On the Reduction of Errors in DNA Computation

Author: Roweis Sam
Winfree Erik
Publication venue: Mary Ann Liebert, Inc.
Publication date: 07/05/1999
Field of study

In this paper, we discuss techniques for reducing errors in DNA computation. We investigate several methods for achieving acceptable overall error rates for a computation using basic operations that are error prone. We analyze a single essential biotechnology, sequence-specific separation, and show that separation errors theoretically can be reduced to tolerable levels by invoking a tradeoff between time, space, and error rates at the level of algorithm design. These tradeoffs do not depend upon improvement of the underlying biotechnology which implements the separation step. We outline several specific ways in which error reduction can be done and present numerical calculations of their performance

Caltech Authors

K-corrections and filter transformations in the ultraviolet, optical, and near infrared

Author: Blanton Michael R.
Roweis Sam
Publication venue: 'University of Chicago Press'
Publication date: 07/06/2006
Field of study

Template fits to observed galaxy fluxes allow calculation of K-corrections and conversions among observations of galaxies at various wavelengths. We present a method for creating model-based template sets given a set of heterogeneous photometric and spectroscopic galaxy data. Our technique, non-negative matrix factorization, is akin to principle component analysis (PCA), except that it is constrained to produce nonnegative templates, it can use a basis set of models (rather than the delta function basis of PCA), and it naturally handles uncertainties, missing data, and heterogeneous data (including broad-band fluxes at various redshifts). The particular implementation we present here is suitable for ultraviolet, optical, and near-infrared observations in the redshift range 0 < z < 1.5. Since we base our templates on stellar population synthesis models, the results are intepretable in terms of approximate stellar masses and star-formation histories. We present templates fit with this method to data from GALEX, Sloan Digital Sky Survey spectroscopy and photometry, the Two-Micron All Sky Survey, the Deep Extragalactic Evolutionary Probe and the Great Observatories Origins Deep Survey. In addition, we present software for using such data to estimate K-corrections and stellar masses.Comment: 43 pages, 20 figures, submitted to AJ, software and full-resolution figures available at http://cosmo.nyu.edu/blanton/kcorrec

arXiv.org e-Print Archive

Crossref

CERN Document Server

Bi-stochastic kernels via asymmetric affinity functions

Author: Belkin
Bengio
Coifman
Coifman
Donoho
Kushnir
Matthew J. Hirn
Ronald R. Coifman
Roweis
Sinkhorn
Tenenbaum
Wang
Publication venue: 'Elsevier BV'
Publication date: 11/07/2013
Field of study

In this short letter we present the construction of a bi-stochastic kernel p for an arbitrary data set X that is derived from an asymmetric affinity function {\alpha}. The affinity function {\alpha} measures the similarity between points in X and some reference set Y. Unlike other methods that construct bi-stochastic kernels via some convergent iteration process or through solving an optimization problem, the construction presented here is quite simple. Furthermore, it can be viewed through the lens of out of sample extensions, making it useful for massive data sets.Comment: 5 pages. v2: Expanded upon the first paragraph of subsection 2.1. v3: Minor changes and edits. v4: Edited comments and added DO

arXiv.org e-Print Archive

Crossref

Cleaning the USNO-B Catalog through automatic detection of optical artifacts

Author: Christopher Stumm
David W. Hogg
Dustin Lang
Hampel F. R.
Høg E.
Jonathan T. Barron
Lang D.
Sam Roweis
Publication venue: 'IOP Publishing'
Publication date: 20/01/2008
Field of study

The USNO-B Catalog contains spurious entries that are caused by diffraction spikes and circular reflection halos around bright stars in the original imaging data. These spurious entries appear in the Catalog as if they were real stars; they are confusing for some scientific tasks. The spurious entries can be identified by simple computer vision techniques because they produce repeatable patterns on the sky. Some techniques employed here are variants of the Hough transform, one of which is sensitive to (two-dimensional) overdensities of faint stars in thin right-angle cross patterns centered on bright (<13 \mag) stars, and one of which is sensitive to thin annular overdensities centered on very bright (<7 \mag) stars. After enforcing conservative statistical requirements on spurious-entry identifications, we find that of the 1,042,618,261 entries in the USNO-B Catalog, 24,148,382 of them (2.3 \percent) are identified as spurious by diffraction-spike criteria and 196,133 (0.02 \percent) are identified as spurious by reflection-halo criteria. The spurious entries are often detected in more than 2 bands and are not overwhelmingly outliers in any photometric properties; they therefore cannot be rejected easily on other grounds, i.e., without the use of computer vision techniques. We demonstrate our method, and return to the community in electronic form a table of spurious entries in the Catalog.Comment: published in A

arXiv.org e-Print Archive

Crossref

Inference with Constrained Hidden Markov Models in PRISM

Author: Chang
CHRISTIAN THEIL HAVE
Christiansen
HENNING CHRISTIANSEN
MATTHIEU PETIT
OLE TORP LASSEN
Roth
Roweis
Sato
Sato
Sato
Sato
Van Hentenryck
Publication venue
Publication date: 01/01/2010
Field of study

A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference. Defining HMMs with side-constraints in Constraint Logic Programming have advantages in terms of more compact expression and pruning opportunities during inference. We present a PRISM-based framework for extending HMMs with side-constraints and show how well-known constraints such as cardinality and all different are integrated. We experimentally validate our approach on the biologically motivated problem of global pairwise alignment

arXiv.org e-Print Archive

Crossref

Roskilde Universitet

Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations

Author: Bovy Jo
Hogg David W.
Roweis Sam T.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 29/07/2011
Field of study

We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation--Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual

d

-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or "underlying" distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the heteroskedastic uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with conjugate priors on all of the model parameters and a "split-and-merge" procedure designed to avoid local maxima of the likelihood. We demonstrate the full method by applying it to the problem of inferring the three-dimensional velocity distribution of stars near the Sun from noisy two-dimensional, transverse velocity measurements from the Hipparcos satellite.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS439 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref