1,553 research outputs found
Of `Cocktail Parties' and Exoplanets
The characterisation of ever smaller and fainter extrasolar planets requires
an intricate understanding of one's data and the analysis techniques used.
Correcting the raw data at the 10^-4 level of accuracy in flux is one of the
central challenges. This can be difficult for instruments that do not feature a
calibration plan for such high precision measurements. Here, it is not always
obvious how to de-correlate the data using auxiliary information of the
instrument and it becomes paramount to know how well one can disentangle
instrument systematics from one's data, given nothing but the data itself. We
propose a non-parametric machine learning algorithm, based on the concept of
independent component analysis, to de-convolve the systematic noise and all
non-Gaussian signals from the desired astrophysical signal. Such a `blind'
signal de-mixing is commonly known as the `Cocktail Party problem' in
signal-processing. Given multiple simultaneous observations of the same
exoplanetary eclipse, as in the case of spectrophotometry, we show that we can
often disentangle systematic noise from the original light curve signal without
the use of any complementary information of the instrument. In this paper, we
explore these signal extraction techniques using simulated data and two data
sets observed with the Hubble-NICMOS instrument. Another important application
is the de-correlation of the exoplanetary signal from time-correlated stellar
variability. Using data obtained by the Kepler mission we show that the desired
signal can be de-convolved from the stellar noise using a single time series
spanning several eclipse events. Such non-parametric techniques can provide
important confirmations of the existent parametric corrections reported in the
literature, and their associated results. Additionally they can substantially
improve the precision exoplanetary light curve analysis in the future.Comment: ApJ accepte
Online Tensor Methods for Learning Latent Variable Models
We introduce an online tensor decomposition based approach for two latent
variable modeling problems namely, (1) community detection, in which we learn
the latent communities that the social actors in social networks belong to, and
(2) topic modeling, in which we infer hidden topics of text articles. We
consider decomposition of moment tensors using stochastic gradient descent. We
conduct optimization of multilinear operations in SGD and avoid directly
forming the tensors, to save computational and storage costs. We present
optimized algorithm in two platforms. Our GPU-based implementation exploits the
parallelism of SIMD architectures to allow for maximum speed-up by a careful
optimization of storage and data transfer, whereas our CPU-based implementation
uses efficient sparse matrix computations and is suitable for large sparse
datasets. For the community detection problem, we demonstrate accuracy and
computational efficiency on Facebook, Yelp and DBLP datasets, and for the topic
modeling problem, we also demonstrate good performance on the New York Times
dataset. We compare our results to the state-of-the-art algorithms such as the
variational method, and report a gain of accuracy and a gain of several orders
of magnitude in the execution time.Comment: JMLR 201
From one solution of a 3-satisfiability formula to a solution cluster: Frozen variables and entropy
A solution to a 3-satisfiability (3-SAT) formula can be expanded into a
cluster, all other solutions of which are reachable from this one through a
sequence of single-spin flips. Some variables in the solution cluster are
frozen to the same spin values by one of two different mechanisms: frozen-core
formation and long-range frustrations. While frozen cores are identified by a
local whitening algorithm, long-range frustrations are very difficult to trace,
and they make an entropic belief-propagation (BP) algorithm fail to converge.
For BP to reach a fixed point the spin values of a tiny fraction of variables
(chosen according to the whitening algorithm) are externally fixed during the
iteration. From the calculated entropy values, we infer that, for a large
random 3-SAT formula with constraint density close to the satisfiability
threshold, the solutions obtained by the survey-propagation or the walksat
algorithm belong neither to the most dominating clusters of the formula nor to
the most abundant clusters. This work indicates that a single solution cluster
of a random 3-SAT formula may have further community structures.Comment: 13 pages, 6 figures. Final version as published in PR
Nonlinear Hebbian learning as a unifying principle in receptive field formation
The development of sensory receptive fields has been modeled in the past by a
variety of models including normative models such as sparse coding or
independent component analysis and bottom-up models such as spike-timing
dependent plasticity or the Bienenstock-Cooper-Munro model of synaptic
plasticity. Here we show that the above variety of approaches can all be
unified into a single common principle, namely Nonlinear Hebbian Learning. When
Nonlinear Hebbian Learning is applied to natural images, receptive field shapes
were strongly constrained by the input statistics and preprocessing, but
exhibited only modest variation across different choices of nonlinearities in
neuron models or synaptic plasticity rules. Neither overcompleteness nor sparse
network activity are necessary for the development of localized receptive
fields. The analysis of alternative sensory modalities such as auditory models
or V2 development lead to the same conclusions. In all examples, receptive
fields can be predicted a priori by reformulating an abstract model as
nonlinear Hebbian learning. Thus nonlinear Hebbian learning and natural
statistics can account for many aspects of receptive field formation across
models and sensory modalities
Spatiotemporal Sparse Bayesian Learning with Applications to Compressed Sensing of Multichannel Physiological Signals
Energy consumption is an important issue in continuous wireless
telemonitoring of physiological signals. Compressed sensing (CS) is a promising
framework to address it, due to its energy-efficient data compression
procedure. However, most CS algorithms have difficulty in data recovery due to
non-sparsity characteristic of many physiological signals. Block sparse
Bayesian learning (BSBL) is an effective approach to recover such signals with
satisfactory recovery quality. However, it is time-consuming in recovering
multichannel signals, since its computational load almost linearly increases
with the number of channels.
This work proposes a spatiotemporal sparse Bayesian learning algorithm to
recover multichannel signals simultaneously. It not only exploits temporal
correlation within each channel signal, but also exploits inter-channel
correlation among different channel signals. Furthermore, its computational
load is not significantly affected by the number of channels. The proposed
algorithm was applied to brain computer interface (BCI) and EEG-based driver's
drowsiness estimation. Results showed that the algorithm had both better
recovery performance and much higher speed than BSBL. Particularly, the
proposed algorithm ensured that the BCI classification and the drowsiness
estimation had little degradation even when data were compressed by 80%, making
it very suitable for continuous wireless telemonitoring of multichannel
signals.Comment: Codes are available at:
https://sites.google.com/site/researchbyzhang/stsb
An ABORT-like detector with improved mismatched signals rejection capabilities
In this paper, we present a GLRT-based adaptive detection algorithm for extended targets with improved rejection capabilities of mismatched signals. We assume that a set of secondary data is available and that noise returns in primary and secondary data share the same statistical characterization. To increase the selectivity of the detector, similarly to the ABORT formulation, we modify the hypothesis testing problem at hand introducing fictitious signals under the null hypothesis. Such unwanted signals are supposed to be orthogonal to the nominal steering vector in the whitened observation space. The performance assessment, carried out by Monte Carlo simulation, shows that the proposed dectector ensures better rejection capabilities of mismatched signals than existing ones, at the price of a certain loss in terms of detection of matched signals
Detection of Potential Transit Signals in Sixteen Quarters of Kepler Mission Data
We present the results of a search for potential transit signals in four
years of photometry data acquired by the Kepler Mission. The targets of the
search include 111,800 stars which were observed for the entire interval and
85,522 stars which were observed for a subset of the interval. We found that
9,743 targets contained at least one signal consistent with the signature of a
transiting or eclipsing object, where the criteria for detection are
periodicity of the detected transits, adequate signal-to-noise ratio, and
acceptance by a number of tests which reject false positive detections. When
targets that had produced a signal were searched repeatedly, an additional
6,542 signals were detected on 3,223 target stars, for a total of 16,285
potential detections. Comparison of the set of detected signals with a set of
known and vetted transit events in the Kepler field of view shows that the
recovery rate for these signals is 96.9%. The ensemble properties of the
detected signals are reviewed.Comment: Accepted by ApJ Supplemen
Robust Principal Component Analysis on Graphs
Principal Component Analysis (PCA) is the most widely used tool for linear
dimensionality reduction and clustering. Still it is highly sensitive to
outliers and does not scale well with respect to the number of data samples.
Robust PCA solves the first issue with a sparse penalty term. The second issue
can be handled with the matrix factorization model, which is however
non-convex. Besides, PCA based clustering can also be enhanced by using a graph
of data similarity. In this article, we introduce a new model called "Robust
PCA on Graphs" which incorporates spectral graph regularization into the Robust
PCA framework. Our proposed model benefits from 1) the robustness of principal
components to occlusions and missing values, 2) enhanced low-rank recovery, 3)
improved clustering property due to the graph smoothness assumption on the
low-rank matrix, and 4) convexity of the resulting optimization problem.
Extensive experiments on 8 benchmark, 3 video and 2 artificial datasets with
corruptions clearly reveal that our model outperforms 10 other state-of-the-art
models in its clustering and low-rank recovery tasks
- …