Search CORE

181 research outputs found

Retrospective suspect and non-target screening combined with similarity measures to prioritize MDMA and amphetamine synthesis markers in wastewater

Author: Been F.
Boucheron T.
de Voogt P.
Emke E.
Esseiva P.
Reymond N.
Ter Laak T.
Publication venue: 'Elsevier BV'
Publication date: 10/03/2022
Field of study

3,4-Methylenedioxymethamphetamine (MDMA) and amphetamine are commonly used psychoactive stimulants. Illegal manufacture of these substances, mainly located in the Netherlands and Belgium, generates large amounts of chemical waste which is disposed in the environment or released in sewer systems. Retrospective analysis of high-resolution mass spectrometry (HRMS) data was implemented to detect synthesis markers of MDMA and amphetamine production in wastewater samples. Specifically, suspect and non-target screening, combined with a prioritization approach based on similarity measures between detected features and mass loads of MDMA and amphetamine was implemented. Two hundred and thirty-five 24 h-composite wastewater samples collected from a treatment plant in the Netherlands between 2016 and 2018 were analyzed by liquid chromatography coupled to high-resolution mass spectrometry. Samples were initially separated into two groups (i.e., baseline consumption versus dumping) based on daily loads of MDMA and amphetamine. Significance testing and fold-changes were used to find differences between features in the two groups. Then, associations between peak areas of all features and MDMA or amphetamine loads were investigated across the whole time series using various measures (Euclidian distance, Pearson's correlation coefficient, Spearman's rank correlation coefficient, distance correlation and maximum information coefficient). This unsupervised and unbiased approach was used for prioritization of features and allowed the selection of 28 presumed markers of production of MDMA and amphetamine. These markers could potentially be used to detect dumps in sewer systems, help in determining the synthesis route and track down the waste in the environment

Serveur académique lausannois

International Migration, Integration and Social Cohesion online publications

UvA-DARE

PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

Author: A. Tsybakov
C. Cortes
D. A. McAllester
D. A. McAllester
E. Mammen
J. H. Friedman
J. Rissanen
J.-Y. Audibert
L. Devroye
P. Alquier
R. Schapire
S. Boucheron
T. Zhang
W. Hoeffding
Publication venue: 'Allerton Press'
Publication date: 01/01/2008
Field of study

The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection

arXiv.org e-Print Archive

Crossref

Hal-Diderot

HAL-Polytechnique

A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models

Author: A. Bain
A. Doucet
A. Doucet
A. Golightly
A. Jasra
A. Kong
B. Shen
C. Andrieu
C.P. Robert
D. Wilkinson
D. Williams
D.T. Gillespie
E. Koblents
E. Marinari
Eugenia Koblents
J. Carpenter
Joaquín Míguez
M.F. Bugallo
O. Cappé
O. Cappé
P. Djuric
P. Milner
P. Moral Del
R. Douc
R. Gramacy
R.J. Boys
S. Boucheron
T. Bengtsson
V. Volterra
W. Hoeffding
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/08/2012
Field of study

This paper addresses the problem of Monte Carlo approximation of posterior probability distributions. In particular, we have considered a recently proposed technique known as population Monte Carlo (PMC), which is based on an iterative importance sampling approach. An important drawback of this methodology is the degeneracy of the importance weights when the dimension of either the observations or the variables of interest is high. To alleviate this difficulty, we propose a novel method that performs a nonlinear transformation on the importance weights. This operation reduces the weight variation, hence it avoids their degeneracy and increases the efficiency of the importance sampling scheme, specially when drawing from a proposal functions which are poorly adapted to the true posterior. For the sake of illustration, we have applied the proposed algorithm to the estimation of the parameters of a Gaussian mixture model. This is a very simple problem that enables us to clearly show and discuss the main features of the proposed technique. As a practical application, we have also considered the popular (and challenging) problem of estimating the rate parameters of stochastic kinetic models (SKM). SKMs are highly multivariate systems that model molecular interactions in biological and chemical problems. We introduce a particularization of the proposed algorithm to SKMs and present numerical results.Comment: 35 pages, 8 figure

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

e-Archivo (Univ. Carlos III de Madrid e-Archivo)

Sparsity and Incoherence in Compressive Sampling

Author: Baraniuk R G Davenport M DeVore R Wakin M
Boucheron S
Emmanuel Candès
Justin Romberg
Ledoux M
Ledoux M
Mallat S
Massart P
Rudelson M Vershynin R
Takhar D Laska J N Wakin M B Duarte M F Baron D Sarvotham S Kelly K F Baraniuk R G
Tropp J A
Tropp J Gilbert A
Publication venue: 'IOP Publishing'
Publication date: 01/01/2006
Field of study

We consider the problem of reconstructing a sparse signal

x^0\in\R^n

from a limited number of linear measurements. Given

m

randomly selected samples of

U x^0

, where

U

is an orthonormal matrix, we show that

\ell_1

minimization recovers

x^0

exactly when the number of measurements exceeds

m\geq \mathrm{Const}\cdot\mu^2(U)\cdot S\cdot\log n,

where

S

is the number of nonzero components in

x^0

, and

\mu

is the largest entry in

U

properly normalized:

\mu(U) = \sqrt{n} \cdot \max_{k,j} |U_{k,j}|

. The smaller

\mu

, the fewer samples needed. The result holds for ``most'' sparse signals

x^0

supported on a fixed (but arbitrary) set

T

. Given

T

, if the sign of

x^0

for each nonzero entry on

T

and the observed values of

Ux^0

are drawn at random, the signal is recovered with overwhelming probability. Moreover, there is a sense in which this is nearly optimal since any method succeeding with the same probability would require just about this many samples

arXiv.org e-Print Archive

CiteSeerX

Crossref

Caltech Authors

CUTOFF AT THE " ENTROPIC TIME " FOR SPARSE MARKOV CHAINS

Author: A Ben-Hamou
A Smith
C Bordenave
Charles Bordenave
D Aldous
DA Freedman
DA Levin
DB Wilson
E Lubetzky
E Lubetzky
J Ding
J Pitman
Justin Salez
M Hildebrand
P Diaconis
P Diaconis
P Diaconis
P Diaconis
Pietro Caputo
R Basu
S Boucheron
W Feller
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/01/2018
Field of study

International audienceWe study convergence to equilibrium for a large class of Markov chains in random environment. The chains are sparse in the sense that in every row of the transition matrix P the mass is essentially concentrated on few entries. Moreover, the random environment is such that rows of P are independent and such that the entries are exchangeable within each row. This includes various models of random walks on sparse random directed graphs. The models are generally non reversible and the equilibrium distribution is itself unknown. In this general setting we establish the cutoff phenomenon for the total variation distance to equilibrium, with mixing time given by the logarithm of the number of states times the inverse of the average row entropy of P. As an application, we consider the case where the rows of P are i.i.d. random vectors in the domain of attraction of a Poisson-Dirichlet law with index α ∈ (0, 1). Our main results are based on a detailed analysis of the weight of the trajectory followed by the walker. This approach offers an interpretation of cutoff as an instance of the concentration of measure phenomenon

arXiv.org e-Print Archive

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Archivio della Ricerca - Università di Roma 3

HAL-INSA Toulouse

Hal-Diderot

Estimation in high dimensions: a geometric perspective

This tutorial provides an exposition of a flexible geometric framework for high dimensional estimation problems with constraints. The tutorial develops geometric intuition about high dimensional sets, justifies it with some results of asymptotic convex geometry, and demonstrates connections between geometric results and estimation problems. The theory is illustrated with applications to sparse recovery, matrix completion, quantization, linear and logistic regression and generalized linear models.Comment: 56 pages, 9 figures. Multiple minor change

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Some Properties of R\'{e}nyi Entropy over Countably Infinite Alphabets

Author: A Rényi
A Rényi
A Sumiyoshi
A Wehrl
D Bontemps
D Kendall
EK Lenzi
I Csiszár
I. Stanojević
J Aczél
K Knopp
L Golshani
L Golshani
LL Campbell
M Ben-Bassat
M. Kovačević
P Jizba
P Jizba
P Nath
S Boucheron
S-W Ho
S-W Ho
S-W Ho
S-W Ho
T Erven van
V. Šenk
W Rudin
W Rudin
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 27/02/2013
Field of study

In this paper we study certain properties of R\'{e}nyi entropy functionals

H_\alpha(\mathcal{P})

on the space of probability distributions over

\mathbb{Z}_+

. Primarily, continuity and convergence issues are addressed. Some properties shown parallel those known in the finite alphabet case, while others illustrate a quite different behaviour of R\'enyi entropy in the infinite case. In particular, it is shown that, for any distribution

\mathcal P

and any

r\in[0,\infty]

, there exists a sequence of distributions

\mathcal{P}_n

converging to

\mathcal{P}

with respect to the total variation distance, such that

\lim_{n\to\infty}\lim_{\alpha\to{1+}} H_\alpha(\mathcal{P}_n) = \lim_{\alpha\to{1+}}\lim_{n\to\infty} H_\alpha(\mathcal{P}_n) + r

.Comment: 13 pages (single-column

arXiv.org e-Print Archive

Crossref

Towards Machine Wald

Author: A. Ben-Tal
A. Ben-Tal
A. Ben-Tal
A. Dvoretzky
A. Madansky
A. Shapiro
A. Spanos
A. Wald
A. Wald
A. Wald
A. Wald
A. Wald
A. Wald
A.A. Gaivoronski
A.A. Kidane
A.D. Rikun
A.M. Geoffrion
A.M. Stuart
A.S. Nemirovsky
A.W. Marshall
B. Rustem
B.J.K. Kleijn
C. Scovel
C.C. Huang
C.C. Huang
C.D. Aliprantis
D. Bertsimas
D. Blackwell
D.A. Freedman
D.A. Freedman
D.G. Kendall
E.L. Lehmann
E.L. Lehmann
E.L. Lehmann
E.S. Pearson
E.T. Jaynes
E.W. Packel
G. Belot
G. Tintner
G. Winkler
G. Winkler
G.A. Hanasusanto
G.B. Dantzig
G.W. Platzman
H. Hotelling
H. Joe
H. Leahu
H. Owhadi
H. Owhadi
H. Owhadi
H. Strasser
H. Woźniakowski
H.D. Kurz
H.D. Sherali
H.J. Godwin
H.P. Edmundson
I. Castillo
I. Elishakoff
I. Gilboa
I. Olkin
I. Pinelis
I. Pinelis
J. Kiefer
J. Kiefer
J. Lenhard
J. Neumann Von
J. Neumann Von
J. Neumann Von
J. Neyman
J. Neyman
J. Neyman
J. Neyman
J. Pfanzagl
J. Rojo
J. Rojo
J. Rojo
J. Wolfowitz
J. Žáčková
J.E. Smith
J.F. Nash Jr
J.R. Birge
J.W. Tukey
K. Frauendorfer
K. Isii
K. Zhou
L. Cam Le
L. Cam Le
L. Cam Le
L. Cam Le
L. Wasserman
L. Wasserman
L.D. Brown
L.D. Brown
L.E. Dubins
L.F. Richardson
L.G. Valiant
L.J. Savage
M. Adams
M. Kac
M. Mangel
M. Sniedovich
M. Sniedovich
M. Sniedovich
M. Wilson
M.G. Kreĭn
N.D. Singpurwalla
N.M. Laird
P. Kall
P. Lynch
P. Ressel
P.-H.T. Kamga
P.J. Huber
P.R. Halmos
R. Fisher
R. Fisher
R. Leonard
R. Mises von
R.A. Fisher
R.A. Fisher
R.F. Drenick
R.I. Boţ
R.T. Rockafellar
S. Boucheron
S.N. Bernstein
T. Tjur
T.J. Sullivan
T.W. Anderson
V. Bentkus
V. Bentkus
V. Bentkus
V.I. Bogachev
V.S. Varadarajan
W. Chen
W. Hoeffding
W. Wiesemann
Y. Ermoliev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2015
Field of study

The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity.Comment: 37 page

arXiv.org e-Print Archive

Crossref

Caltech Authors

Adaptive partitioning schemes for bipartite ranking

Author: A. Nobel
A. Tsybakov
B. Ripley
C. Burges
C. Ferri
E. Hüllermeier
F. Provost
G. Lugosi
J. Friedman
L. Devroye
Marine Depecker
Nicolas Vayatis
P. Flach
P. Massart
R. Serfling
S. Arlot
S. Boucheron
S. Clémençon
S. Clémençon
S. Clémençon
S. Mallat
Stéphan Clémençon
T. Hastie
T. Joachims
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref