1,480 research outputs found
A Unifying review of linear gaussian models
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model.We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models
Astrometry.net: Blind astrometric calibration of arbitrary astronomical images
We have built a reliable and robust system that takes as input an
astronomical image, and returns as output the pointing, scale, and orientation
of that image (the astrometric calibration or WCS information). The system
requires no first guess, and works with the information in the image pixels
alone; that is, the problem is a generalization of the "lost in space" problem
in which nothing--not even the image scale--is known. After robust source
detection is performed in the input image, asterisms (sets of four or five
stars) are geometrically hashed and compared to pre-indexed hashes to generate
hypotheses about the astrometric calibration. A hypothesis is only accepted as
true if it passes a Bayesian decision theory test against a background
hypothesis. With indices built from the USNO-B Catalog and designed for
uniformity of coverage and redundancy, the success rate is 99.9% for
contemporary near-ultraviolet and visual imaging survey data, with no false
positives. The failure rate is consistent with the incompleteness of the USNO-B
Catalog; augmentation with indices built from the 2MASS Catalog brings the
completeness to 100% with no false positives. We are using this system to
generate consistent and standards-compliant meta-data for digital and digitized
imaging from plate repositories, automated observatories, individual scientific
investigators, and hobbyists. This is the first step in a program of making it
possible to trust calibration meta-data for astronomical data of arbitrary
provenance.Comment: submitted to A
On the Reduction of Errors in DNA Computation
In this paper, we discuss techniques for reducing errors in DNA computation. We investigate several methods for achieving acceptable overall error rates for a computation using basic operations that are error prone. We analyze a single essential biotechnology, sequence-specific separation, and show that separation errors theoretically can be reduced to tolerable levels by invoking a tradeoff between time, space, and error rates at the level of algorithm design. These tradeoffs do not depend upon improvement of the underlying biotechnology which implements the separation step. We outline several specific ways in which error reduction can be done and present numerical calculations of their performance
K-corrections and filter transformations in the ultraviolet, optical, and near infrared
Template fits to observed galaxy fluxes allow calculation of K-corrections
and conversions among observations of galaxies at various wavelengths. We
present a method for creating model-based template sets given a set of
heterogeneous photometric and spectroscopic galaxy data. Our technique,
non-negative matrix factorization, is akin to principle component analysis
(PCA), except that it is constrained to produce nonnegative templates, it can
use a basis set of models (rather than the delta function basis of PCA), and it
naturally handles uncertainties, missing data, and heterogeneous data
(including broad-band fluxes at various redshifts). The particular
implementation we present here is suitable for ultraviolet, optical, and
near-infrared observations in the redshift range 0 < z < 1.5. Since we base our
templates on stellar population synthesis models, the results are intepretable
in terms of approximate stellar masses and star-formation histories. We present
templates fit with this method to data from GALEX, Sloan Digital Sky Survey
spectroscopy and photometry, the Two-Micron All Sky Survey, the Deep
Extragalactic Evolutionary Probe and the Great Observatories Origins Deep
Survey. In addition, we present software for using such data to estimate
K-corrections and stellar masses.Comment: 43 pages, 20 figures, submitted to AJ, software and full-resolution
figures available at http://cosmo.nyu.edu/blanton/kcorrec
Bi-stochastic kernels via asymmetric affinity functions
In this short letter we present the construction of a bi-stochastic kernel p
for an arbitrary data set X that is derived from an asymmetric affinity
function {\alpha}. The affinity function {\alpha} measures the similarity
between points in X and some reference set Y. Unlike other methods that
construct bi-stochastic kernels via some convergent iteration process or
through solving an optimization problem, the construction presented here is
quite simple. Furthermore, it can be viewed through the lens of out of sample
extensions, making it useful for massive data sets.Comment: 5 pages. v2: Expanded upon the first paragraph of subsection 2.1. v3:
Minor changes and edits. v4: Edited comments and added DO
Cleaning the USNO-B Catalog through automatic detection of optical artifacts
The USNO-B Catalog contains spurious entries that are caused by diffraction
spikes and circular reflection halos around bright stars in the original
imaging data. These spurious entries appear in the Catalog as if they were real
stars; they are confusing for some scientific tasks. The spurious entries can
be identified by simple computer vision techniques because they produce
repeatable patterns on the sky. Some techniques employed here are variants of
the Hough transform, one of which is sensitive to (two-dimensional)
overdensities of faint stars in thin right-angle cross patterns centered on
bright (<13 \mag) stars, and one of which is sensitive to thin annular
overdensities centered on very bright (<7 \mag) stars. After enforcing
conservative statistical requirements on spurious-entry identifications, we
find that of the 1,042,618,261 entries in the USNO-B Catalog, 24,148,382 of
them (2.3 \percent) are identified as spurious by diffraction-spike criteria
and 196,133 (0.02 \percent) are identified as spurious by reflection-halo
criteria. The spurious entries are often detected in more than 2 bands and are
not overwhelmingly outliers in any photometric properties; they therefore
cannot be rejected easily on other grounds, i.e., without the use of computer
vision techniques. We demonstrate our method, and return to the community in
electronic form a table of spurious entries in the Catalog.Comment: published in A
Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations
We generalize the well-known mixtures of Gaussians approach to density
estimation and the accompanying Expectation--Maximization technique for finding
the maximum likelihood parameters of the mixture to the case where each data
point carries an individual -dimensional uncertainty covariance and has
unique missing data properties. This algorithm reconstructs the
error-deconvolved or "underlying" distribution function common to all samples,
even when the individual data points are samples from different distributions,
obtained by convolving the underlying distribution with the heteroskedastic
uncertainty distribution of the data point and projecting out the missing data
directions. We show how this basic algorithm can be extended with conjugate
priors on all of the model parameters and a "split-and-merge" procedure
designed to avoid local maxima of the likelihood. We demonstrate the full
method by applying it to the problem of inferring the three-dimensional
velocity distribution of stars near the Sun from noisy two-dimensional,
transverse velocity measurements from the Hipparcos satellite.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS439 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Inference with Constrained Hidden Markov Models in PRISM
A Hidden Markov Model (HMM) is a common statistical model which is widely
used for analysis of biological sequence data and other sequential phenomena.
In the present paper we show how HMMs can be extended with side-constraints and
present constraint solving techniques for efficient inference. Defining HMMs
with side-constraints in Constraint Logic Programming have advantages in terms
of more compact expression and pruning opportunities during inference.
We present a PRISM-based framework for extending HMMs with side-constraints
and show how well-known constraints such as cardinality and all different are
integrated. We experimentally validate our approach on the biologically
motivated problem of global pairwise alignment
- …