3,602 research outputs found
Power Spectrum Estimators For Large CMB Datasets
Forthcoming high-resolution observations of the Cosmic Microwave Background
(CMB) radiation will generate datasets many orders of magnitude larger than
have been obtained to date. The size and complexity of such datasets presents a
very serious challenge to analysing them with existing or anticipated
computers. Here we present an investigation of the currently favored algorithm
for obtaining the power spectrum from a sky-temperature map --- the quadratic
estimator. We show that, whilst improving on direct evaluation of the
likelihood function, current implementations still inherently scale as the
equivalent of the cube of the number of pixels or worse, and demonstrate the
critical importance of choosing the right implementation for a particular
dataset.Comment: 8 pages LATEX, no figures, corrected misaligned columns in table
Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study in Polish
In stylometric investigations, frequencies of the most frequent words (MFWs)
and character n-grams outperform other style-markers, even if their performance
varies significantly across languages. In inflected languages, word endings
play a prominent role, and hence different word forms cannot be recognized
using generic text tokenization. Countless inflected word forms make
frequencies sparse, making most statistical procedures complicated. Presumably,
applying one of the NLP techniques, such as lemmatization and/or parsing, might
increase the performance of classification. The aim of this paper is to examine
the usefulness of grammatical features (as assessed via POS-tag n-grams) and
lemmatized forms in recognizing authorial profiles, in order to address the
underlying issue of the degree of freedom of choice within lexis and grammar.
Using a corpus of Polish novels, we performed a series of supervised authorship
attribution benchmarks, in order to compare the classification accuracy for
different types of lexical and syntactic style-markers. Even if the performance
of POS-tags as well as lemmatized forms was notoriously worse than that of
lexical markers, the difference was not substantial and never exceeded ca. 15%
Modeling the dynamics of language change: logistic regression, Piotrowski's law, and a handful of examples in Polish
The study discusses modeling diachronic processes by logistic regression. The
phenomenon of nonlinear changes in language was first observed by Raimund
Piotrowski (hence labelled as Piotrowski's law), even if actual linguistic
evidence usually speaks against using the notion of a "law" in this context. In
our study, we apply logistic regression models to 9 changes which occurred
between 15th and 18th century in the Polish language. The attested course of
the majority of these changes closely follow the expected values, which proves
that the language change might indeed resemble a nonlinear phase change
scenario. We also extend the original Piotrowski's approach by proposing
polynomial logistic regression for these cases which can hardly be described by
its standard version. Also, we propose to consider individual language change
cases jointly, in order to inspect their possible collinearity or, more likely,
their different dynamics in the function of time. Last but not least, we
evaluate our results by testing the influence of the subcorpus size on the
model's goodness-of-fit
Markov Chain Beam Randomization: a study of the impact of PLANCK beam measurement errors on cosmological parameter estimation
We introduce a new method to propagate uncertainties in the beam shapes used
to measure the cosmic microwave background to cosmological parameters
determined from those measurements. The method, which we call Markov Chain Beam
Randomization, MCBR, randomly samples from a set of templates or functions that
describe the beam uncertainties. The method is much faster than direct
numerical integration over systematic `nuisance' parameters, and is not
restricted to simple, idealized cases as is analytic marginalization. It does
not assume the data are normally distributed, and does not require Gaussian
priors on the specific systematic uncertainties. We show that MCBR properly
accounts for and provides the marginalized errors of the parameters. The method
can be generalized and used to propagate any systematic uncertainties for which
a set of templates is available. We apply the method to the Planck satellite,
and consider future experiments. Beam measurement errors should have a small
effect on cosmological parameters as long as the beam fitting is performed
after removal of 1/f noise.Comment: 17 pages, 23 figures, revised version with improved explanation of
the MCBR and overall wording. Accepted for publication in Astronomy and
Astrophysics (to appear in the Planck pre-launch special issue
Optimized Large-Scale CMB Likelihood And Quadratic Maximum Likelihood Power Spectrum Estimation
We revisit the problem of exact CMB likelihood and power spectrum estimation
with the goal of minimizing computational cost through linear compression. This
idea was originally proposed for CMB purposes by Tegmark et al.\ (1997), and
here we develop it into a fully working computational framework for large-scale
polarization analysis, adopting \WMAP\ as a worked example. We compare five
different linear bases (pixel space, harmonic space, noise covariance
eigenvectors, signal-to-noise covariance eigenvectors and signal-plus-noise
covariance eigenvectors) in terms of compression efficiency, and find that the
computationally most efficient basis is the signal-to-noise eigenvector basis,
which is closely related to the Karhunen-Loeve and Principal Component
transforms, in agreement with previous suggestions. For this basis, the
information in 6836 unmasked \WMAP\ sky map pixels can be compressed into a
smaller set of 3102 modes, with a maximum error increase of any single
multipole of 3.8\% at , and a maximum shift in the mean values of a
joint distribution of an amplitude--tilt model of 0.006. This
compression reduces the computational cost of a single likelihood evaluation by
a factor of 5, from 38 to 7.5 CPU seconds, and it also results in a more robust
likelihood by implicitly regularizing nearly degenerate modes. Finally, we use
the same compression framework to formulate a numerically stable and
computationally efficient variation of the Quadratic Maximum Likelihood
implementation that requires less than 3 GB of memory and 2 CPU minutes per
iteration for , rendering low- QML CMB power spectrum
analysis fully tractable on a standard laptop.Comment: 13 pages, 13 figures, accepted by ApJ
2-Point Correlations in the COBE DMR 4-Year Anisotropy Maps
The 2-point temperature correlation function is evaluated from the 4-year
COBE DMR microwave anisotropy maps. We examine the 2-point function, which is
the Legendre transform of the angular power spectrum, and show that the data
are statistically consistent from channel to channel and frequency to
frequency. The most likely quadrupole normalization is computed for a
scale-invariant power-law spectrum of CMB anisotropy, using a variety of data
combinations. For a given data set, the normalization inferred from the 2-point
data is consistent with that inferred by other methods. The smallest and
largest normalization deduced from any data combination are 16.4 and 19.6 uK
respectively, with a value ~18 uK generally preferred.Comment: Sumbitted to ApJ Letter
Power Spectrum of Primordial Inhomogeneity Determined from the 4-Year COBE DMR Sky Maps
Fourier analysis and power spectrum estimation of the cosmic microwave
background anisotropy on an incompletely sampled sky developed by Gorski (1994)
has been applied to the high-latitude portion of the 4-year COBE DMR 31.5, 53
and 90 GHz sky maps. Likelihood analysis using newly constructed Galaxy cuts
(extended beyond |b| = 20deg to excise the known foreground emission) and
simultaneously correcting for the faint high latitude galactic foreground
emission is conducted on the DMR sky maps pixelized in both ecliptic and
galactic coordinates. The Bayesian power spectrum estimation from the
foreground corrected 4-year COBE DMR data renders n ~ 1.2 +/- 0.3, and
Q_{rms-PS} ~ 15.3^{+3.7}_{-2.8} microK (projections of the two-parameter
likelihood). These results are consistent with the Harrison-Zel'dovich n=1
model of amplitude Q_{rms-PS} ~ 18 microK detected with significance exceeding
14sigma (dQ/Q < 0.07). (A small power spectrum amplitude drop below the
published 2-year results is predominantly due to the application of the new,
extended Galaxy cuts.)Comment: 9 pages of text in LaTeX, 1 postscript Table, 4 postscript figures (2
color plates), submitted to The Astrophysical Journal (Letters
Probing non-Gaussianities in the CMB on an incomplete sky using surrogates
We demonstrate the feasibility to generate surrogates by Fourier-based
methods for an incomplete data set. This is performed for the case of a CMB
analysis, where astrophysical foreground emission, mainly present in the
Galactic plane, is a major challenge. The shuffling of the Fourier phases for
generating surrogates is now enabled by transforming the spherical harmonics
into a new set of basis functions that are orthonormal on the cut sky. The
results show that non-Gaussianities and hemispherical asymmetries in the CMB as
identified in several former investigations, can still be detected even when
the complete Galactic plane (|b| < 30{\deg}) is removed. We conclude that the
Galactic plane cannot be the dominant source for these anomalies. The results
point towards a violation of statistical isotropy.Comment: 9 pages, 13 figures, accepted by Physical Review
- …