74 research outputs found
Adaptive density estimation for stationary processes
We propose an algorithm to estimate the common density of a stationary
process . We suppose that the process is either or
-mixing. We provide a model selection procedure based on a generalization
of Mallows' and we prove oracle inequalities for the selected estimator
under a few prior assumptions on the collection of models and on the mixing
coefficients. We prove that our estimator is adaptive over a class of Besov
spaces, namely, we prove that it achieves the same rates of convergence as in
the i.i.d framework
Rho-estimators for shape restricted density estimation
International audienceThe purpose of this paper is to pursue our study of Ï-estimators built from i.i.d. observations that we defined in Baraud et al. (2014). For a Ï-estimator based on some model S (which means that the estimator belongs to S) and a true distribution of the observations that also belongs to S, the risk (with squared Hellinger loss) is bounded by a quantity which can be viewed as a dimension function of the model and is often related to the âmetric dimensionâ of this model, as defined in Birg Ìe (2006). This is a minimax point of view and it is well-known that it is pessimistic. Typically, the bound is accurate for most points in the model but may be very pessimistic when the true distribution belongs to some specific part of it. This is the situation that we want to investigate here. For some models, like the set of decreasing densities on [0, 1], there exist specific points in the model that we shall call extremal and for which the risk is substantially smaller than the typical risk. Moreover, the risk at a non-extremal point of the model can be bounded by the sum of the risk bound at a well-chosen extremal point plus the square of its distance to this point. This implies that if the true density is close enough to an extremal point, the risk at this point may be smaller than the minimax risk on the model and this actually remains true even if the true density does not belong to the model. The result is based on some refined bounds on the suprema of empirical processes that are established in Baraud (2016)
Testing probability distributions underlying aggregated data
In this paper, we analyze and study a hybrid model for testing and learning
probability distributions. Here, in addition to samples, the testing algorithm
is provided with one of two different types of oracles to the unknown
distribution over . More precisely, we define both the dual and
cumulative dual access models, in which the algorithm can both sample from
and respectively, for any ,
- query the probability mass (query access); or
- get the total mass of , i.e. (cumulative
access)
These two models, by generalizing the previously studied sampling and query
oracle models, allow us to bypass the strong lower bounds established for a
number of problems in these settings, while capturing several interesting
aspects of these problems -- and providing new insight on the limitations of
the models. Finally, we show that while the testing algorithms can be in most
cases strictly more efficient, some tasks remain hard even with this additional
power
Adaptive management for ecosystem services
Management of natural resources for the production of ecosystem services, which are vital for human well-being, is necessary even when there is uncertainty regarding system response to management action. This uncertainty is the result of incomplete controllability, complex internal feedbacks, and nonlinearity that often interferes with desired management outcomes, and insufficient understanding of nature and people. Adaptive management was developed to reduce such uncertainty. We present a framework for the application of adaptive management for ecosystem services that explicitly accounts for cross-scale tradeoffs in the production of ecosystem services. Our framework focuses on identifying key spatiotemporal scales (plot, patch, ecosystem, landscape, and region) that encompass dominant structures and processes in the system, and includes within- and cross-scale dynamics, ecosystem service tradeoffs, and management controllability within and across scales. Resilience theory recognizes that a limited set of ecological processes in a given system regulate ecosystem services, yet our understanding of these processes is poorly understood. If management actions erode or remove these processes, the system may shift into an alternative state unlikely to support the production of desired services. Adaptive management provides a process to assess the underlying within and cross-scale tradeoffs associated with production of ecosystem services while proceeding with management designed to meet the demands of a growing human population
Adaptive management for ecosystem services
Management of natural resources for the production of ecosystem services, which are vital for human well-being, is necessary even when there is uncertainty regarding system response to management action. This uncertainty is the result of incomplete controllability, complex internal feedbacks, and nonlinearity that often interferes with desired management outcomes, and insufficient understanding of nature and people. Adaptive management was developed to reduce such uncertainty. We present a framework for the application of adaptive management for ecosystem services that explicitly accounts for cross-scale tradeoffs in the production of ecosystem services. Our framework focuses on identifying key spatiotemporal scales (plot, patch, ecosystem, landscape, and region) that encompass dominant structures and processes in the system, and includes within- and cross-scale dynamics, ecosystem service tradeoffs, and management controllability within and across scales. Resilience theory recognizes that a limited set of ecological processes in a given system regulate ecosystem services, yet our understanding of these processes is poorly understood. If management actions erode or remove these processes, the system may shift into an alternative state unlikely to support the production of desired services. Adaptive management provides a process to assess the underlying within and cross-scale tradeoffs associated with production of ecosystem services while proceeding with management designed to meet the demands of a growing human population
Learning Poisson Binomial Distributions
We consider a basic problem in unsupervised learning: learning an unknown
\emph{Poisson Binomial Distribution}. A Poisson Binomial Distribution (PBD)
over is the distribution of a sum of independent
Bernoulli random variables which may have arbitrary, potentially non-equal,
expectations. These distributions were first studied by S. Poisson in 1837
\cite{Poisson:37} and are a natural -parameter generalization of the
familiar Binomial Distribution. Surprisingly, prior to our work this basic
learning problem was poorly understood, and known results for it were far from
optimal.
We essentially settle the complexity of the learning problem for this basic
class of distributions. As our first main result we give a highly efficient
algorithm which learns to \eps-accuracy (with respect to the total variation
distance) using \tilde{O}(1/\eps^3) samples \emph{independent of }. The
running time of the algorithm is \emph{quasilinear} in the size of its input
data, i.e., \tilde{O}(\log(n)/\eps^3) bit-operations. (Observe that each draw
from the distribution is a -bit string.) Our second main result is a
{\em proper} learning algorithm that learns to \eps-accuracy using
\tilde{O}(1/\eps^2) samples, and runs in time (1/\eps)^{\poly (\log
(1/\eps))} \cdot \log n. This is nearly optimal, since any algorithm {for this
problem} must use \Omega(1/\eps^2) samples. We also give positive and
negative results for some extensions of this learning problem to weighted sums
of independent Bernoulli random variables.Comment: Revised full version. Improved sample complexity bound of O~(1/eps^2
Low Complexity Regularization of Linear Inverse Problems
Inverse problems and regularization theory is a central theme in contemporary
signal processing, where the goal is to reconstruct an unknown signal from
partial indirect, and possibly noisy, measurements of it. A now standard method
for recovering the unknown signal is to solve a convex optimization problem
that enforces some prior knowledge about its structure. This has proved
efficient in many problems routinely encountered in imaging sciences,
statistics and machine learning. This chapter delivers a review of recent
advances in the field where the regularization prior promotes solutions
conforming to some notion of simplicity/low-complexity. These priors encompass
as popular examples sparsity and group sparsity (to capture the compressibility
of natural signals and images), total variation and analysis sparsity (to
promote piecewise regularity), and low-rank (as natural extension of sparsity
to matrix-valued data). Our aim is to provide a unified treatment of all these
regularizations under a single umbrella, namely the theory of partial
smoothness. This framework is very general and accommodates all low-complexity
regularizers just mentioned, as well as many others. Partial smoothness turns
out to be the canonical way to encode low-dimensional models that can be linear
spaces or more general smooth manifolds. This review is intended to serve as a
one stop shop toward the understanding of the theoretical properties of the
so-regularized solutions. It covers a large spectrum including: (i) recovery
guarantees and stability to noise, both in terms of -stability and
model (manifold) identification; (ii) sensitivity analysis to perturbations of
the parameters involved (in particular the observations), with applications to
unbiased risk estimation ; (iii) convergence properties of the forward-backward
proximal splitting scheme, that is particularly well suited to solve the
corresponding large-scale regularized optimization problem
Wavelet penalized likelihood estimation in generalized functional models
The paper deals with generalized functional regression. The aim is to
estimate the influence of covariates on observations, drawn from an exponential
distribution. The link considered has a semiparametric expression: if we are
interested in a functional influence of some covariates, we authorize others to
be modeled linearly. We thus consider a generalized partially linear regression
model with unknown regression coefficients and an unknown nonparametric
function. We present a maximum penalized likelihood procedure to estimate the
components of the model introducing penalty based wavelet estimators.
Asymptotic rates of the estimates of both the parametric and the nonparametric
part of the model are given and quasi-minimax optimality is obtained under
usual conditions in literature. We establish in particular that the LASSO
penalty leads to an adaptive estimation with respect to the regularity of the
estimated function. An algorithm based on backfitting and Fisher-scoring is
also proposed for implementation. Simulations are used to illustrate the finite
sample behaviour, including a comparison with kernel and splines based methods
- âŠ