61 research outputs found
GMRF Estimation under Topological and Spectral Constraints
International audienceWe investigate the problem of Gaussian Markov random field selection under a non-analytic constraint: the estimated models must be compatible with a fast inference algorithm, namely the Gaussian belief propagation algorithm. To address this question, we introduce the *-IPS framework, based on iterative proportional scaling, which incrementally selects candidate links in a greedy manner. Besides its intrinsic sparsity-inducing ability, this algorithm is flexible enough to incorporate various spectral constraints, like e.g. walk summability, and topological constraints, like short loops avoidance. Experimental tests on various datasets, including traffic data from San Francisco Bay Area, indicate that this approach can deliver, with reasonable computational cost, a broad range of efficient inference models, which are not accessible through penalization with traditional sparsity-inducing norms
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Correlations and Clustering in Wholesale Electricity Markets
We study the structure of locational marginal prices in day-ahead and
real-time wholesale electricity markets. In particular, we consider the case of
two North American markets and show that the price correlations contain
information on the locational structure of the grid. We study various
clustering methods and introduce a type of correlation function based on event
synchronization for spiky time series, and another based on string correlations
of location names provided by the markets. This allows us to reconstruct
aspects of the locational structure of the grid.Comment: 30 pages, several picture
Information Extraction and Modeling from Remote Sensing Images: Application to the Enhancement of Digital Elevation Models
To deal with high complexity data such as remote sensing images presenting metric resolution over large areas, an innovative, fast and robust image processing system is presented.
The modeling of increasing level of information is used to extract, represent and link image features to semantic content.
The potential of the proposed techniques is demonstrated with an application to enhance and regularize digital elevation models based on information collected from RS images
Latent binary MRF for online reconstruction of large scale systems
International audienceWe present a novel method for online inference of real-valued quantities on a large network from very sparse measurements. The target application is a large scale system, like e.g. a traffic network, where a small varying subset of the variables is observed, and predictions about the other variables have to be continuously updated. A key feature of our approach is the modeling of dependencies between the original variables through a latent binary Markov random field. This greatly simplifies both the model selection and its subsequent use. We introduce the mirror belief propagation algorithm, that performs fast inference in such a setting. The offline model estimation relies only on pairwise historical data and its complexity is linear w.r.t. the dataset size. Our method makes no assumptions about the joint and marginal distributions of the variables but is primarily designed with multimodal joint distributions in mind. Numerical experiments demonstrate both the applicability and scalability of the method in practice
Distributed Detection and Estimation in Wireless Sensor Networks
In this article we consider the problems of distributed detection and
estimation in wireless sensor networks. In the first part, we provide a general
framework aimed to show how an efficient design of a sensor network requires a
joint organization of in-network processing and communication. Then, we recall
the basic features of consensus algorithm, which is a basic tool to reach
globally optimal decisions through a distributed approach. The main part of the
paper starts addressing the distributed estimation problem. We show first an
entirely decentralized approach, where observations and estimations are
performed without the intervention of a fusion center. Then, we consider the
case where the estimation is performed at a fusion center, showing how to
allocate quantization bits and transmit powers in the links between the nodes
and the fusion center, in order to accommodate the requirement on the maximum
estimation variance, under a constraint on the global transmit power. We extend
the approach to the detection problem. Also in this case, we consider the
distributed approach, where every node can achieve a globally optimal decision,
and the case where the decision is taken at a central node. In the latter case,
we show how to allocate coding bits and transmit power in order to maximize the
detection probability, under constraints on the false alarm rate and the global
transmit power. Then, we generalize consensus algorithms illustrating a
distributed procedure that converges to the projection of the observation
vector onto a signal subspace. We then address the issue of energy consumption
in sensor networks, thus showing how to optimize the network topology in order
to minimize the energy necessary to achieve a global consensus. Finally, we
address the problem of matching the topology of the network to the graph
describing the statistical dependencies among the observed variables.Comment: 92 pages, 24 figures. To appear in E-Reference Signal Processing, R.
Chellapa and S. Theodoridis, Eds., Elsevier, 201
- …