1,553 research outputs found

    Of `Cocktail Parties' and Exoplanets

    Full text link
    The characterisation of ever smaller and fainter extrasolar planets requires an intricate understanding of one's data and the analysis techniques used. Correcting the raw data at the 10^-4 level of accuracy in flux is one of the central challenges. This can be difficult for instruments that do not feature a calibration plan for such high precision measurements. Here, it is not always obvious how to de-correlate the data using auxiliary information of the instrument and it becomes paramount to know how well one can disentangle instrument systematics from one's data, given nothing but the data itself. We propose a non-parametric machine learning algorithm, based on the concept of independent component analysis, to de-convolve the systematic noise and all non-Gaussian signals from the desired astrophysical signal. Such a `blind' signal de-mixing is commonly known as the `Cocktail Party problem' in signal-processing. Given multiple simultaneous observations of the same exoplanetary eclipse, as in the case of spectrophotometry, we show that we can often disentangle systematic noise from the original light curve signal without the use of any complementary information of the instrument. In this paper, we explore these signal extraction techniques using simulated data and two data sets observed with the Hubble-NICMOS instrument. Another important application is the de-correlation of the exoplanetary signal from time-correlated stellar variability. Using data obtained by the Kepler mission we show that the desired signal can be de-convolved from the stellar noise using a single time series spanning several eclipse events. Such non-parametric techniques can provide important confirmations of the existent parametric corrections reported in the literature, and their associated results. Additionally they can substantially improve the precision exoplanetary light curve analysis in the future.Comment: ApJ accepte

    Online Tensor Methods for Learning Latent Variable Models

    Get PDF
    We introduce an online tensor decomposition based approach for two latent variable modeling problems namely, (1) community detection, in which we learn the latent communities that the social actors in social networks belong to, and (2) topic modeling, in which we infer hidden topics of text articles. We consider decomposition of moment tensors using stochastic gradient descent. We conduct optimization of multilinear operations in SGD and avoid directly forming the tensors, to save computational and storage costs. We present optimized algorithm in two platforms. Our GPU-based implementation exploits the parallelism of SIMD architectures to allow for maximum speed-up by a careful optimization of storage and data transfer, whereas our CPU-based implementation uses efficient sparse matrix computations and is suitable for large sparse datasets. For the community detection problem, we demonstrate accuracy and computational efficiency on Facebook, Yelp and DBLP datasets, and for the topic modeling problem, we also demonstrate good performance on the New York Times dataset. We compare our results to the state-of-the-art algorithms such as the variational method, and report a gain of accuracy and a gain of several orders of magnitude in the execution time.Comment: JMLR 201

    From one solution of a 3-satisfiability formula to a solution cluster: Frozen variables and entropy

    Full text link
    A solution to a 3-satisfiability (3-SAT) formula can be expanded into a cluster, all other solutions of which are reachable from this one through a sequence of single-spin flips. Some variables in the solution cluster are frozen to the same spin values by one of two different mechanisms: frozen-core formation and long-range frustrations. While frozen cores are identified by a local whitening algorithm, long-range frustrations are very difficult to trace, and they make an entropic belief-propagation (BP) algorithm fail to converge. For BP to reach a fixed point the spin values of a tiny fraction of variables (chosen according to the whitening algorithm) are externally fixed during the iteration. From the calculated entropy values, we infer that, for a large random 3-SAT formula with constraint density close to the satisfiability threshold, the solutions obtained by the survey-propagation or the walksat algorithm belong neither to the most dominating clusters of the formula nor to the most abundant clusters. This work indicates that a single solution cluster of a random 3-SAT formula may have further community structures.Comment: 13 pages, 6 figures. Final version as published in PR

    Nonlinear Hebbian learning as a unifying principle in receptive field formation

    Get PDF
    The development of sensory receptive fields has been modeled in the past by a variety of models including normative models such as sparse coding or independent component analysis and bottom-up models such as spike-timing dependent plasticity or the Bienenstock-Cooper-Munro model of synaptic plasticity. Here we show that the above variety of approaches can all be unified into a single common principle, namely Nonlinear Hebbian Learning. When Nonlinear Hebbian Learning is applied to natural images, receptive field shapes were strongly constrained by the input statistics and preprocessing, but exhibited only modest variation across different choices of nonlinearities in neuron models or synaptic plasticity rules. Neither overcompleteness nor sparse network activity are necessary for the development of localized receptive fields. The analysis of alternative sensory modalities such as auditory models or V2 development lead to the same conclusions. In all examples, receptive fields can be predicted a priori by reformulating an abstract model as nonlinear Hebbian learning. Thus nonlinear Hebbian learning and natural statistics can account for many aspects of receptive field formation across models and sensory modalities

    Spatiotemporal Sparse Bayesian Learning with Applications to Compressed Sensing of Multichannel Physiological Signals

    Full text link
    Energy consumption is an important issue in continuous wireless telemonitoring of physiological signals. Compressed sensing (CS) is a promising framework to address it, due to its energy-efficient data compression procedure. However, most CS algorithms have difficulty in data recovery due to non-sparsity characteristic of many physiological signals. Block sparse Bayesian learning (BSBL) is an effective approach to recover such signals with satisfactory recovery quality. However, it is time-consuming in recovering multichannel signals, since its computational load almost linearly increases with the number of channels. This work proposes a spatiotemporal sparse Bayesian learning algorithm to recover multichannel signals simultaneously. It not only exploits temporal correlation within each channel signal, but also exploits inter-channel correlation among different channel signals. Furthermore, its computational load is not significantly affected by the number of channels. The proposed algorithm was applied to brain computer interface (BCI) and EEG-based driver's drowsiness estimation. Results showed that the algorithm had both better recovery performance and much higher speed than BSBL. Particularly, the proposed algorithm ensured that the BCI classification and the drowsiness estimation had little degradation even when data were compressed by 80%, making it very suitable for continuous wireless telemonitoring of multichannel signals.Comment: Codes are available at: https://sites.google.com/site/researchbyzhang/stsb

    An ABORT-like detector with improved mismatched signals rejection capabilities

    Get PDF
    In this paper, we present a GLRT-based adaptive detection algorithm for extended targets with improved rejection capabilities of mismatched signals. We assume that a set of secondary data is available and that noise returns in primary and secondary data share the same statistical characterization. To increase the selectivity of the detector, similarly to the ABORT formulation, we modify the hypothesis testing problem at hand introducing fictitious signals under the null hypothesis. Such unwanted signals are supposed to be orthogonal to the nominal steering vector in the whitened observation space. The performance assessment, carried out by Monte Carlo simulation, shows that the proposed dectector ensures better rejection capabilities of mismatched signals than existing ones, at the price of a certain loss in terms of detection of matched signals

    Detection of Potential Transit Signals in Sixteen Quarters of Kepler Mission Data

    Full text link
    We present the results of a search for potential transit signals in four years of photometry data acquired by the Kepler Mission. The targets of the search include 111,800 stars which were observed for the entire interval and 85,522 stars which were observed for a subset of the interval. We found that 9,743 targets contained at least one signal consistent with the signature of a transiting or eclipsing object, where the criteria for detection are periodicity of the detected transits, adequate signal-to-noise ratio, and acceptance by a number of tests which reject false positive detections. When targets that had produced a signal were searched repeatedly, an additional 6,542 signals were detected on 3,223 target stars, for a total of 16,285 potential detections. Comparison of the set of detected signals with a set of known and vetted transit events in the Kepler field of view shows that the recovery rate for these signals is 96.9%. The ensemble properties of the detected signals are reviewed.Comment: Accepted by ApJ Supplemen

    Robust Principal Component Analysis on Graphs

    Get PDF
    Principal Component Analysis (PCA) is the most widely used tool for linear dimensionality reduction and clustering. Still it is highly sensitive to outliers and does not scale well with respect to the number of data samples. Robust PCA solves the first issue with a sparse penalty term. The second issue can be handled with the matrix factorization model, which is however non-convex. Besides, PCA based clustering can also be enhanced by using a graph of data similarity. In this article, we introduce a new model called "Robust PCA on Graphs" which incorporates spectral graph regularization into the Robust PCA framework. Our proposed model benefits from 1) the robustness of principal components to occlusions and missing values, 2) enhanced low-rank recovery, 3) improved clustering property due to the graph smoothness assumption on the low-rank matrix, and 4) convexity of the resulting optimization problem. Extensive experiments on 8 benchmark, 3 video and 2 artificial datasets with corruptions clearly reveal that our model outperforms 10 other state-of-the-art models in its clustering and low-rank recovery tasks