13,087 research outputs found
BMICA-independent component analysis based on B-spline mutual information estimator
The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. Its estimation however using B-Spline has not been used before in creating an approach for Independent Component Analysis. In this paper we present a B-Spline estimator for mutual information to find the independent components in mixed signals. Tested using electroencephalography (EEG) signals the resulting BMICA (B-Spline Mutual Information Independent Component Analysis)
exhibits better performance than the standard Independent Component Analysis algorithms of FastICA, JADE, SOBI and EFICA in similar simulations. BMICA was found to be also more reliable than the 'renown' FastICA
Classification with Asymmetric Label Noise: Consistency and Maximal Denoising
In many real-world classification problems, the labels of training examples
are randomly corrupted. Most previous theoretical work on classification with
label noise assumes that the two classes are separable, that the label noise is
independent of the true class label, or that the noise proportions for each
class are known. In this work, we give conditions that are necessary and
sufficient for the true class-conditional distributions to be identifiable.
These conditions are weaker than those analyzed previously, and allow for the
classes to be nonseparable and the noise levels to be asymmetric and unknown.
The conditions essentially state that a majority of the observed labels are
correct and that the true class-conditional distributions are "mutually
irreducible," a concept we introduce that limits the similarity of the two
distributions. For any label noise problem, there is a unique pair of true
class-conditional distributions satisfying the proposed conditions, and we
argue that this pair corresponds in a certain sense to maximal denoising of the
observed distributions.
Our results are facilitated by a connection to "mixture proportion
estimation," which is the problem of estimating the maximal proportion of one
distribution that is present in another. We establish a novel rate of
convergence result for mixture proportion estimation, and apply this to obtain
consistency of a discrimination rule based on surrogate loss minimization.
Experimental results on benchmark data and a nuclear particle classification
problem demonstrate the efficacy of our approach
The non-Gaussianity of the cosmic shear likelihood - or: How odd is the Chandra Deep Field South?
(abridged) We study the validity of the approximation of a Gaussian cosmic
shear likelihood. We estimate the true likelihood for a fiducial cosmological
model from a large set of ray-tracing simulations and investigate the impact of
non-Gaussianity on cosmological parameter estimation. We investigate how odd
the recently reported very low value of really is as derived from
the \textit{Chandra} Deep Field South (CDFS) using cosmic shear by taking the
non-Gaussianity of the likelihood into account as well as the possibility of
biases coming from the way the CDFS was selected.
We find that the cosmic shear likelihood is significantly non-Gaussian. This
leads to both a shift of the maximum of the posterior distribution and a
significantly smaller credible region compared to the Gaussian case. We
re-analyse the CDFS cosmic shear data using the non-Gaussian likelihood.
Assuming that the CDFS is a random pointing, we find
for fixed . In a
WMAP5-like cosmology, a value equal to or lower than this would be expected in
of the times. Taking biases into account arising from the way the
CDFS was selected, which we model as being dependent on the number of haloes in
the CDFS, we obtain . Combining the CDFS data
with the parameter constraints from WMAP5 yields and for a flat
universe.Comment: 18 pages, 16 figures, accepted for publication in A&A; New Bayesian
treatment of field selection bia
Submillimeter Number Counts From Statistical Analysis of BLAST Maps
We describe the application of a statistical method to estimate submillimeter
galaxy number counts from confusion limited observations by the Balloon-borne
Large Aperture Submillimeter Telescope (BLAST). Our method is based on a
maximum likelihood fit to the pixel histogram, sometimes called 'P(D)', an
approach which has been used before to probe faint counts, the difference being
that here we advocate its use even for sources with relatively high
signal-to-noise ratios. This method has an advantage over standard techniques
of source extraction in providing an unbiased estimate of the counts from the
bright end down to flux densities well below the confusion limit. We
specifically analyse BLAST observations of a roughly 10 sq. deg. map centered
on the Great Observatories Origins Deep Survey South (GOODS-S) field. We
provide estimates of number counts at the three BLAST wavelengths, 250, 350,
and 500 microns; instead of counting sources in flux bins we estimate the
counts at several flux density nodes connected with power-laws. We observe a
generally very steep slope for the counts of about -3.7 at 250 microns and -4.5
at 350 and 500 microns, over the range ~0.02-0.5 Jy, breaking to a shallower
slope below about 0.015 Jy at all three wavelengths. We also describe how to
estimate the uncertainties and correlations in this method so that the results
can be used for model-fitting. This method should be well-suited for analysis
of data from the Herschel satellite.Comment: Accepted for publication in the Astrophysical Journal; see associated
data and other papers at http://blastexperiment.info
Cosmological baryonic and matter densities from 600,000 SDSS Luminous Red Galaxies with photometric redshifts
We analyze MegaZ-LRG, a photometric-redshift catalogue of Luminous Red
Galaxies (LRGs) based on the imaging data of the Sloan Digital Sky Survey
(SDSS) 4th Data Release. MegaZ-LRG, presented in a companion paper, contains
10^6 photometric redshifts derived with ANNz, an Artificial Neural Network
method, constrained by a spectroscopic sub-sample of 13,000 galaxies obtained
by the 2dF-SDSS LRG and Quasar (2SLAQ) survey. The catalogue spans the redshift
range 0.4 < z < 0.7 with an r.m.s. redshift error ~ 0.03(1+z), covering 5,914
deg^2 to map out a total cosmic volume 2.5 h^-3 Gpc^3. In this study we use the
most reliable 600,000 photometric redshifts to present the first cosmological
parameter fits to galaxy angular power spectra from a photometric redshift
survey. Combining the redshift slices with appropriate covariances, we
determine best-fitting values for the matter and baryon densities of Omega_m h
= 0.195 +/- 0.023 and Omega_b/Omega_m = 0.16 +/- 0.036 (with the Hubble
parameter h = 0.75 and scalar index of primordial fluctuations n = 1 held
fixed). These results are in agreement with and independent of the latest
studies of the Cosmic Microwave Background radiation, and their precision is
comparable to analyses of contemporary spectroscopic-redshift surveys. We
perform an extensive series of tests which conclude that our power spectrum
measurements are robust against potential systematic photometric errors in the
catalogue. We conclude that photometric-redshift surveys are competitive with
spectroscopic surveys for measuring cosmological parameters in the simplest
vanilla models. Future deep imaging surveys have great potential for further
improvement, provided that systematic errors can be controlled.Comment: 24 pages, 23 figures, MNRAS accepte
Self-consistent method for density estimation
The estimation of a density profile from experimental data points is a
challenging problem, usually tackled by plotting a histogram. Prior assumptions
on the nature of the density, from its smoothness to the specification of its
form, allow the design of more accurate estimation procedures, such as Maximum
Likelihood. Our aim is to construct a procedure that makes no explicit
assumptions, but still providing an accurate estimate of the density. We
introduce the self-consistent estimate: the power spectrum of a candidate
density is given, and an estimation procedure is constructed on the assumption,
to be released \emph{a posteriori}, that the candidate is correct. The
self-consistent estimate is defined as a prior candidate density that precisely
reproduces itself. Our main result is to derive the exact expression of the
self-consistent estimate for any given dataset, and to study its properties.
Applications of the method require neither priors on the form of the density
nor the subjective choice of parameters. A cutoff frequency, akin to a bin size
or a kernel bandwidth, emerges naturally from the derivation. We apply the
self-consistent estimate to artificial data generated from various
distributions and show that it reaches the theoretical limit for the scaling of
the square error with the dataset size.Comment: 21 pages, 5 figure
- âŠ