93 research outputs found
Short note on two output-dependent hidden Markov models
The purpose of this note is to study the assumption of mutual information independence", which is used by Zhou (2005) for deriving an output-dependent hidden Markov model, the so-called discriminative HMM (D-HMM), in the context of determining a stochastic optimal sequence of hidden states. The assumption is extended to derive its generative counterpart, the G-HMM. In addition, state-dependent representations for two output-dependent HMMs, namely HMMSDO (Li, 2005) and D-HMM, are presented
Do unbalanced data have a negative effect on LDA?
For two-class discrimination, Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562] claimed that, when covariance matrices of the two classes were unequal, a (class) unbalanced data set had a negative effect on the performance of linear discriminant analysis (LDA). Through re-balancing 10 real-world data sets, Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562] provided empirical evidence to support the claim using AUC (Area Under the receiver operating characteristic Curve) as the performance metric. We suggest that such a claim is vague if not misleading, there is no solid theoretical analysis presented in Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562], and AUC can lead to a quite different conclusion from that led to by misclassification error rate (ER) on the discrimination performance of LDA for unbalanced data sets. Our empirical and simulation studies suggest that, for LDA, the increase of the median of AUC (and thus the improvement of performance of LDA) from re-balancing is relatively small, while, in contrast, the increase of the median of ER (and thus the decline in performance of LDA) from re-balancing is relatively large. Therefore, from our study, there is no reliable empirical evidence to support the claim that a (class) unbalanced data set has a negative effect on the performance of LDA. In addition, re-balancing affects the performance of LDA for data sets with either equal or unequal covariance matrices, indicating that having unequal covariance matrices is not a key reason for the difference in performance between original and re-balanced data
Learning Mixtures of Gaussians in High Dimensions
Efficiently learning mixture of Gaussians is a fundamental problem in
statistics and learning theory. Given samples coming from a random one out of k
Gaussian distributions in Rn, the learning problem asks to estimate the means
and the covariance matrices of these Gaussians. This learning problem arises in
many areas ranging from the natural sciences to the social sciences, and has
also found many machine learning applications. Unfortunately, learning mixture
of Gaussians is an information theoretically hard problem: in order to learn
the parameters up to a reasonable accuracy, the number of samples required is
exponential in the number of Gaussian components in the worst case. In this
work, we show that provided we are in high enough dimensions, the class of
Gaussian mixtures is learnable in its most general form under a smoothed
analysis framework, where the parameters are randomly perturbed from an
adversarial starting point. In particular, given samples from a mixture of
Gaussians with randomly perturbed parameters, when n > {\Omega}(k^2), we give
an algorithm that learns the parameters with polynomial running time and using
polynomial number of samples. The central algorithmic ideas consist of new ways
to decompose the moment tensor of the Gaussian mixture by exploiting its
structural properties. The symmetries of this tensor are derived from the
combinatorial structure of higher order moments of Gaussian distributions
(sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop
new tools for bounding smallest singular values of structured random matrices,
which could be useful in other smoothed analysis settings
First results from the Very Small Array -- I. Observational methods
The Very Small Array (VSA) is a synthesis telescope designed to image faint
structures in the cosmic microwave background on degree and sub-degree angular
scales. The VSA has key differences from other CMB interferometers with the
result that different systematic errors are expected. We have tested the
operation of the VSA with a variety of blank-field and calibrator observations
and cross-checked its calibration scale against independent measurements. We
find that systematic effects can be suppressed below the thermal noise level in
long observations; the overall calibration accuracy of the flux density scale
is 3.5 percent and is limited by the external absolute calibration scale.Comment: 9 pages, 10 figures, MNRAS in press (Minor revisions
Microwave observations of spinning dust emission in NGC6946
We report new cm-wave measurements at five frequencies between 15 and 18GHz
of the continuum emission from the reportedly anomalous "region 4" of the
nearby galaxy NGC6946. We find that the emission in this frequency range is
significantly in excess of that measured at 8.5GHz, but has a spectrum from
15-18GHz consistent with optically thin free-free emission from a compact HII
region. In combination with previously published data we fit four emission
models containing different continuum components using the Bayesian spectrum
analysis package radiospec. These fits show that, in combination with data at
other frequencies, a model with a spinning dust component is slightly preferred
to those that possess better-established emission mechanisms.Comment: submitted MNRA
Further Sunyaev-Zel'dovich observations of two Planck ERCSC clusters with the Arcminute Microkelvin Imager
We present follow-up observations of two galaxy clusters detected blindly via
the Sunyaev-Zel'dovich (SZ) effect and released in the Planck Early Release
Compact Source Catalogue. We use the Arcminute Microkelvin Imager, a dual-array
14-18 GHz radio interferometer. After radio source subtraction, we find a SZ
decrement of integrated flux density -1.08+/-0.10 mJy toward PLCKESZ
G121.11+57.01, and improve the position measurement of the cluster, finding the
centre to be RA 12 59 36.4, Dec +60 04 46.8, to an accuracy of 20 arcseconds.
The region of PLCKESZ G115.71+17.52 contains strong extended emission, so we
are unable to confirm the presence of this cluster via the SZ effect.Comment: 4 tables, 3 figures, revised after referee's comments and resubmitted
to MNRA
High resolution AMI Large Array imaging of spinning dust sources: spatially correlated 8 micron emission and evidence of a stellar wind in L675
We present 25 arcsecond resolution radio images of five Lynds Dark Nebulae
(L675, L944, L1103, L1111 & L1246) at 16 GHz made with the Arcminute
Microkelvin Imager (AMI) Large Array. These objects were previously observed
with the AMI Small Array to have an excess of emission at microwave frequencies
relative to lower frequency radio data. In L675 we find a flat spectrum compact
radio counterpart to the 850 micron emission seen with SCUBA and suggest that
it is cm-wave emission from a previously unknown deeply embedded young
protostar. In the case of L1246 the cm-wave emission is spatially correlated
with 8 micron emission seen with Spitzer. Since the MIR emission is present
only in Spitzer band 4 we suggest that it arises from a population of PAH
molecules, which also give rise to the cm-wave emission through spinning dust
emission.Comment: accepted MNRA
AMI observations of unmatched Planck ERCSC LFI sources at 15.75 GHz
The Planck Early Release Compact Source Catalogue includes 26 sources with no
obvious matches in other radio catalogues (of primarily extragalactic sources).
Here we present observations made with the Arcminute Microkelvin Imager Small
Array (AMI SA) at 15.75 GHz of the eight of the unmatched sources at
declination > +10 degrees. Of the eight, four are detected and are associated
with known objects. The other four are not detected with the AMI SA, and are
thought to be spurious.Comment: 6 pages, 5 figures, 4 table
Searching for non-Gaussianity in the VSA data
We have tested Very Small Array (VSA) observations of three regions of sky
for the presence of non-Gaussianity, using high-order cumulants, Minkowski
functionals, a wavelet-based test and a Bayesian joint power
spectrum/non-Gaussianity analysis. We find the data from two regions to be
consistent with Gaussianity. In the third region, we obtain a 96.7% detection
of non-Gaussianity using the wavelet test. We perform simulations to
characterise the tests, and conclude that this is consistent with expected
residual point source contamination. There is therefore no evidence that this
detection is of cosmological origin. Our simulations show that the tests would
be sensitive to any residual point sources above the data's source subtraction
level of 20 mJy. The tests are also sensitive to cosmic string networks at an
rms fluctuation level of (i.e. equivalent to the best-fit observed
value). They are not sensitive to string-induced fluctuations if an equal rms
of Gaussian CDM fluctuations is added, thereby reducing the fluctuations due to
the strings network to rms . We especially highlight the usefulness
of non-Gaussianity testing in eliminating systematic effects from our data.Comment: Minor corrections; accepted for publication to MNRA
- …