93 research outputs found

    Short note on two output-dependent hidden Markov models

    Get PDF
    The purpose of this note is to study the assumption of mutual information independence", which is used by Zhou (2005) for deriving an output-dependent hidden Markov model, the so-called discriminative HMM (D-HMM), in the context of determining a stochastic optimal sequence of hidden states. The assumption is extended to derive its generative counterpart, the G-HMM. In addition, state-dependent representations for two output-dependent HMMs, namely HMMSDO (Li, 2005) and D-HMM, are presented

    Do unbalanced data have a negative effect on LDA?

    Get PDF
    For two-class discrimination, Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562] claimed that, when covariance matrices of the two classes were unequal, a (class) unbalanced data set had a negative effect on the performance of linear discriminant analysis (LDA). Through re-balancing 10 real-world data sets, Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562] provided empirical evidence to support the claim using AUC (Area Under the receiver operating characteristic Curve) as the performance metric. We suggest that such a claim is vague if not misleading, there is no solid theoretical analysis presented in Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562], and AUC can lead to a quite different conclusion from that led to by misclassification error rate (ER) on the discrimination performance of LDA for unbalanced data sets. Our empirical and simulation studies suggest that, for LDA, the increase of the median of AUC (and thus the improvement of performance of LDA) from re-balancing is relatively small, while, in contrast, the increase of the median of ER (and thus the decline in performance of LDA) from re-balancing is relatively large. Therefore, from our study, there is no reliable empirical evidence to support the claim that a (class) unbalanced data set has a negative effect on the performance of LDA. In addition, re-balancing affects the performance of LDA for data sets with either equal or unequal covariance matrices, indicating that having unequal covariance matrices is not a key reason for the difference in performance between original and re-balanced data

    Learning Mixtures of Gaussians in High Dimensions

    Full text link
    Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many machine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n > {\Omega}(k^2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to decompose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings

    First results from the Very Small Array -- I. Observational methods

    Full text link
    The Very Small Array (VSA) is a synthesis telescope designed to image faint structures in the cosmic microwave background on degree and sub-degree angular scales. The VSA has key differences from other CMB interferometers with the result that different systematic errors are expected. We have tested the operation of the VSA with a variety of blank-field and calibrator observations and cross-checked its calibration scale against independent measurements. We find that systematic effects can be suppressed below the thermal noise level in long observations; the overall calibration accuracy of the flux density scale is 3.5 percent and is limited by the external absolute calibration scale.Comment: 9 pages, 10 figures, MNRAS in press (Minor revisions

    Microwave observations of spinning dust emission in NGC6946

    Full text link
    We report new cm-wave measurements at five frequencies between 15 and 18GHz of the continuum emission from the reportedly anomalous "region 4" of the nearby galaxy NGC6946. We find that the emission in this frequency range is significantly in excess of that measured at 8.5GHz, but has a spectrum from 15-18GHz consistent with optically thin free-free emission from a compact HII region. In combination with previously published data we fit four emission models containing different continuum components using the Bayesian spectrum analysis package radiospec. These fits show that, in combination with data at other frequencies, a model with a spinning dust component is slightly preferred to those that possess better-established emission mechanisms.Comment: submitted MNRA

    Further Sunyaev-Zel'dovich observations of two Planck ERCSC clusters with the Arcminute Microkelvin Imager

    Full text link
    We present follow-up observations of two galaxy clusters detected blindly via the Sunyaev-Zel'dovich (SZ) effect and released in the Planck Early Release Compact Source Catalogue. We use the Arcminute Microkelvin Imager, a dual-array 14-18 GHz radio interferometer. After radio source subtraction, we find a SZ decrement of integrated flux density -1.08+/-0.10 mJy toward PLCKESZ G121.11+57.01, and improve the position measurement of the cluster, finding the centre to be RA 12 59 36.4, Dec +60 04 46.8, to an accuracy of 20 arcseconds. The region of PLCKESZ G115.71+17.52 contains strong extended emission, so we are unable to confirm the presence of this cluster via the SZ effect.Comment: 4 tables, 3 figures, revised after referee's comments and resubmitted to MNRA

    High resolution AMI Large Array imaging of spinning dust sources: spatially correlated 8 micron emission and evidence of a stellar wind in L675

    Full text link
    We present 25 arcsecond resolution radio images of five Lynds Dark Nebulae (L675, L944, L1103, L1111 & L1246) at 16 GHz made with the Arcminute Microkelvin Imager (AMI) Large Array. These objects were previously observed with the AMI Small Array to have an excess of emission at microwave frequencies relative to lower frequency radio data. In L675 we find a flat spectrum compact radio counterpart to the 850 micron emission seen with SCUBA and suggest that it is cm-wave emission from a previously unknown deeply embedded young protostar. In the case of L1246 the cm-wave emission is spatially correlated with 8 micron emission seen with Spitzer. Since the MIR emission is present only in Spitzer band 4 we suggest that it arises from a population of PAH molecules, which also give rise to the cm-wave emission through spinning dust emission.Comment: accepted MNRA

    AMI observations of unmatched Planck ERCSC LFI sources at 15.75 GHz

    Get PDF
    The Planck Early Release Compact Source Catalogue includes 26 sources with no obvious matches in other radio catalogues (of primarily extragalactic sources). Here we present observations made with the Arcminute Microkelvin Imager Small Array (AMI SA) at 15.75 GHz of the eight of the unmatched sources at declination > +10 degrees. Of the eight, four are detected and are associated with known objects. The other four are not detected with the AMI SA, and are thought to be spurious.Comment: 6 pages, 5 figures, 4 table

    Searching for non-Gaussianity in the VSA data

    Full text link
    We have tested Very Small Array (VSA) observations of three regions of sky for the presence of non-Gaussianity, using high-order cumulants, Minkowski functionals, a wavelet-based test and a Bayesian joint power spectrum/non-Gaussianity analysis. We find the data from two regions to be consistent with Gaussianity. In the third region, we obtain a 96.7% detection of non-Gaussianity using the wavelet test. We perform simulations to characterise the tests, and conclude that this is consistent with expected residual point source contamination. There is therefore no evidence that this detection is of cosmological origin. Our simulations show that the tests would be sensitive to any residual point sources above the data's source subtraction level of 20 mJy. The tests are also sensitive to cosmic string networks at an rms fluctuation level of 105μK105 \mu K (i.e. equivalent to the best-fit observed value). They are not sensitive to string-induced fluctuations if an equal rms of Gaussian CDM fluctuations is added, thereby reducing the fluctuations due to the strings network to 74μK74 \mu K rms . We especially highlight the usefulness of non-Gaussianity testing in eliminating systematic effects from our data.Comment: Minor corrections; accepted for publication to MNRA
    corecore