181 research outputs found
Retrospective suspect and non-target screening combined with similarity measures to prioritize MDMA and amphetamine synthesis markers in wastewater
3,4-Methylenedioxymethamphetamine (MDMA) and amphetamine are commonly used psychoactive stimulants. Illegal manufacture of these substances, mainly located in the Netherlands and Belgium, generates large amounts of chemical waste which is disposed in the environment or released in sewer systems. Retrospective analysis of high-resolution mass spectrometry (HRMS) data was implemented to detect synthesis markers of MDMA and amphetamine production in wastewater samples. Specifically, suspect and non-target screening, combined with a prioritization approach based on similarity measures between detected features and mass loads of MDMA and amphetamine was implemented. Two hundred and thirty-five 24 h-composite wastewater samples collected from a treatment plant in the Netherlands between 2016 and 2018 were analyzed by liquid chromatography coupled to high-resolution mass spectrometry. Samples were initially separated into two groups (i.e., baseline consumption versus dumping) based on daily loads of MDMA and amphetamine. Significance testing and fold-changes were used to find differences between features in the two groups. Then, associations between peak areas of all features and MDMA or amphetamine loads were investigated across the whole time series using various measures (Euclidian distance, Pearson's correlation coefficient, Spearman's rank correlation coefficient, distance correlation and maximum information coefficient). This unsupervised and unbiased approach was used for prioritization of features and allowed the selection of 28 presumed markers of production of MDMA and amphetamine. These markers could potentially be used to detect dumps in sewer systems, help in determining the synthesis route and track down the waste in the environment
PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers
The aim of this paper is to generalize the PAC-Bayesian theorems proved by
Catoni in the classification setting to more general problems of statistical
inference. We show how to control the deviations of the risk of randomized
estimators. A particular attention is paid to randomized estimators drawn in a
small neighborhood of classical estimators, whose study leads to control the
risk of the latter. These results allow to bound the risk of very general
estimation procedures, as well as to perform model selection
A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models
This paper addresses the problem of Monte Carlo approximation of posterior
probability distributions. In particular, we have considered a recently
proposed technique known as population Monte Carlo (PMC), which is based on an
iterative importance sampling approach. An important drawback of this
methodology is the degeneracy of the importance weights when the dimension of
either the observations or the variables of interest is high. To alleviate this
difficulty, we propose a novel method that performs a nonlinear transformation
on the importance weights. This operation reduces the weight variation, hence
it avoids their degeneracy and increases the efficiency of the importance
sampling scheme, specially when drawing from a proposal functions which are
poorly adapted to the true posterior.
For the sake of illustration, we have applied the proposed algorithm to the
estimation of the parameters of a Gaussian mixture model. This is a very simple
problem that enables us to clearly show and discuss the main features of the
proposed technique. As a practical application, we have also considered the
popular (and challenging) problem of estimating the rate parameters of
stochastic kinetic models (SKM). SKMs are highly multivariate systems that
model molecular interactions in biological and chemical problems. We introduce
a particularization of the proposed algorithm to SKMs and present numerical
results.Comment: 35 pages, 8 figure
Sparsity and Incoherence in Compressive Sampling
We consider the problem of reconstructing a sparse signal from a
limited number of linear measurements. Given randomly selected samples of
, where is an orthonormal matrix, we show that minimization
recovers exactly when the number of measurements exceeds where is the number of
nonzero components in , and is the largest entry in properly
normalized: . The smaller ,
the fewer samples needed.
The result holds for ``most'' sparse signals supported on a fixed (but
arbitrary) set . Given , if the sign of for each nonzero entry on
and the observed values of are drawn at random, the signal is
recovered with overwhelming probability. Moreover, there is a sense in which
this is nearly optimal since any method succeeding with the same probability
would require just about this many samples
CUTOFF AT THE " ENTROPIC TIME " FOR SPARSE MARKOV CHAINS
International audienceWe study convergence to equilibrium for a large class of Markov chains in random environment. The chains are sparse in the sense that in every row of the transition matrix P the mass is essentially concentrated on few entries. Moreover, the random environment is such that rows of P are independent and such that the entries are exchangeable within each row. This includes various models of random walks on sparse random directed graphs. The models are generally non reversible and the equilibrium distribution is itself unknown. In this general setting we establish the cutoff phenomenon for the total variation distance to equilibrium, with mixing time given by the logarithm of the number of states times the inverse of the average row entropy of P. As an application, we consider the case where the rows of P are i.i.d. random vectors in the domain of attraction of a Poisson-Dirichlet law with index α ∈ (0, 1). Our main results are based on a detailed analysis of the weight of the trajectory followed by the walker. This approach offers an interpretation of cutoff as an instance of the concentration of measure phenomenon
Estimation in high dimensions: a geometric perspective
This tutorial provides an exposition of a flexible geometric framework for
high dimensional estimation problems with constraints. The tutorial develops
geometric intuition about high dimensional sets, justifies it with some results
of asymptotic convex geometry, and demonstrates connections between geometric
results and estimation problems. The theory is illustrated with applications to
sparse recovery, matrix completion, quantization, linear and logistic
regression and generalized linear models.Comment: 56 pages, 9 figures. Multiple minor change
Some Properties of R\'{e}nyi Entropy over Countably Infinite Alphabets
In this paper we study certain properties of R\'{e}nyi entropy functionals
on the space of probability distributions over
. Primarily, continuity and convergence issues are addressed.
Some properties shown parallel those known in the finite alphabet case, while
others illustrate a quite different behaviour of R\'enyi entropy in the
infinite case. In particular, it is shown that, for any distribution and any , there exists a sequence of distributions
converging to with respect to the total variation
distance, such that .Comment: 13 pages (single-column
Towards Machine Wald
The past century has seen a steady increase in the need of estimating and
predicting complex systems and making (possibly critical) decisions with
limited information. Although computers have made possible the numerical
evaluation of sophisticated statistical models, these models are still designed
\emph{by humans} because there is currently no known recipe or algorithm for
dividing the design of a statistical model into a sequence of arithmetic
operations. Indeed enabling computers to \emph{think} as \emph{humans} have the
ability to do when faced with uncertainty is challenging in several major ways:
(1) Finding optimal statistical models remains to be formulated as a well posed
problem when information on the system of interest is incomplete and comes in
the form of a complex combination of sample data, partial knowledge of
constitutive relations and a limited description of the distribution of input
random variables. (2) The space of admissible scenarios along with the space of
relevant information, assumptions, and/or beliefs, tend to be infinite
dimensional, whereas calculus on a computer is necessarily discrete and finite.
With this purpose, this paper explores the foundations of a rigorous framework
for the scientific computation of optimal statistical estimators/models and
reviews their connections with Decision Theory, Machine Learning, Bayesian
Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty
Quantification and Information Based Complexity.Comment: 37 page
- …