181 research outputs found

    Retrospective suspect and non-target screening combined with similarity measures to prioritize MDMA and amphetamine synthesis markers in wastewater

    Get PDF
    3,4-Methylenedioxymethamphetamine (MDMA) and amphetamine are commonly used psychoactive stimulants. Illegal manufacture of these substances, mainly located in the Netherlands and Belgium, generates large amounts of chemical waste which is disposed in the environment or released in sewer systems. Retrospective analysis of high-resolution mass spectrometry (HRMS) data was implemented to detect synthesis markers of MDMA and amphetamine production in wastewater samples. Specifically, suspect and non-target screening, combined with a prioritization approach based on similarity measures between detected features and mass loads of MDMA and amphetamine was implemented. Two hundred and thirty-five 24 h-composite wastewater samples collected from a treatment plant in the Netherlands between 2016 and 2018 were analyzed by liquid chromatography coupled to high-resolution mass spectrometry. Samples were initially separated into two groups (i.e., baseline consumption versus dumping) based on daily loads of MDMA and amphetamine. Significance testing and fold-changes were used to find differences between features in the two groups. Then, associations between peak areas of all features and MDMA or amphetamine loads were investigated across the whole time series using various measures (Euclidian distance, Pearson's correlation coefficient, Spearman's rank correlation coefficient, distance correlation and maximum information coefficient). This unsupervised and unbiased approach was used for prioritization of features and allowed the selection of 28 presumed markers of production of MDMA and amphetamine. These markers could potentially be used to detect dumps in sewer systems, help in determining the synthesis route and track down the waste in the environment

    PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

    Get PDF
    The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection

    A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models

    Get PDF
    This paper addresses the problem of Monte Carlo approximation of posterior probability distributions. In particular, we have considered a recently proposed technique known as population Monte Carlo (PMC), which is based on an iterative importance sampling approach. An important drawback of this methodology is the degeneracy of the importance weights when the dimension of either the observations or the variables of interest is high. To alleviate this difficulty, we propose a novel method that performs a nonlinear transformation on the importance weights. This operation reduces the weight variation, hence it avoids their degeneracy and increases the efficiency of the importance sampling scheme, specially when drawing from a proposal functions which are poorly adapted to the true posterior. For the sake of illustration, we have applied the proposed algorithm to the estimation of the parameters of a Gaussian mixture model. This is a very simple problem that enables us to clearly show and discuss the main features of the proposed technique. As a practical application, we have also considered the popular (and challenging) problem of estimating the rate parameters of stochastic kinetic models (SKM). SKMs are highly multivariate systems that model molecular interactions in biological and chemical problems. We introduce a particularization of the proposed algorithm to SKMs and present numerical results.Comment: 35 pages, 8 figure

    Sparsity and Incoherence in Compressive Sampling

    Get PDF
    We consider the problem of reconstructing a sparse signal x0Rnx^0\in\R^n from a limited number of linear measurements. Given mm randomly selected samples of Ux0U x^0, where UU is an orthonormal matrix, we show that 1\ell_1 minimization recovers x0x^0 exactly when the number of measurements exceeds mConstμ2(U)Slogn, m\geq \mathrm{Const}\cdot\mu^2(U)\cdot S\cdot\log n, where SS is the number of nonzero components in x0x^0, and μ\mu is the largest entry in UU properly normalized: μ(U)=nmaxk,jUk,j\mu(U) = \sqrt{n} \cdot \max_{k,j} |U_{k,j}|. The smaller μ\mu, the fewer samples needed. The result holds for ``most'' sparse signals x0x^0 supported on a fixed (but arbitrary) set TT. Given TT, if the sign of x0x^0 for each nonzero entry on TT and the observed values of Ux0Ux^0 are drawn at random, the signal is recovered with overwhelming probability. Moreover, there is a sense in which this is nearly optimal since any method succeeding with the same probability would require just about this many samples

    CUTOFF AT THE " ENTROPIC TIME " FOR SPARSE MARKOV CHAINS

    Get PDF
    International audienceWe study convergence to equilibrium for a large class of Markov chains in random environment. The chains are sparse in the sense that in every row of the transition matrix P the mass is essentially concentrated on few entries. Moreover, the random environment is such that rows of P are independent and such that the entries are exchangeable within each row. This includes various models of random walks on sparse random directed graphs. The models are generally non reversible and the equilibrium distribution is itself unknown. In this general setting we establish the cutoff phenomenon for the total variation distance to equilibrium, with mixing time given by the logarithm of the number of states times the inverse of the average row entropy of P. As an application, we consider the case where the rows of P are i.i.d. random vectors in the domain of attraction of a Poisson-Dirichlet law with index α ∈ (0, 1). Our main results are based on a detailed analysis of the weight of the trajectory followed by the walker. This approach offers an interpretation of cutoff as an instance of the concentration of measure phenomenon

    Estimation in high dimensions: a geometric perspective

    Full text link
    This tutorial provides an exposition of a flexible geometric framework for high dimensional estimation problems with constraints. The tutorial develops geometric intuition about high dimensional sets, justifies it with some results of asymptotic convex geometry, and demonstrates connections between geometric results and estimation problems. The theory is illustrated with applications to sparse recovery, matrix completion, quantization, linear and logistic regression and generalized linear models.Comment: 56 pages, 9 figures. Multiple minor change

    Some Properties of R\'{e}nyi Entropy over Countably Infinite Alphabets

    Full text link
    In this paper we study certain properties of R\'{e}nyi entropy functionals Hα(P)H_\alpha(\mathcal{P}) on the space of probability distributions over Z+\mathbb{Z}_+. Primarily, continuity and convergence issues are addressed. Some properties shown parallel those known in the finite alphabet case, while others illustrate a quite different behaviour of R\'enyi entropy in the infinite case. In particular, it is shown that, for any distribution P\mathcal P and any r[0,]r\in[0,\infty], there exists a sequence of distributions Pn\mathcal{P}_n converging to P\mathcal{P} with respect to the total variation distance, such that limnlimα1+Hα(Pn)=limα1+limnHα(Pn)+r\lim_{n\to\infty}\lim_{\alpha\to{1+}} H_\alpha(\mathcal{P}_n) = \lim_{\alpha\to{1+}}\lim_{n\to\infty} H_\alpha(\mathcal{P}_n) + r.Comment: 13 pages (single-column

    Towards Machine Wald

    Get PDF
    The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity.Comment: 37 page
    corecore