Search CORE

20,334 research outputs found

The Hunting of the Bump: On Maximizing Statistical Discrepancy

Author: Agarwal Deepak
Phillips Jeff M.
Venkatasubramanian Suresh
Publication venue
Publication date: 02/10/2005
Field of study

Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy. In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes. We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in

O(\smash[tb]{\frac{1}{\epsilon} n^2 \log^2 n})

that computes the maximum discrepancy rectangle to within additive error

\epsilon

, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time

\smash[t]{O(n^4)}

.Comment: 11 pages. A short version of this paper will appear in SODA06. This full version contains an additional short appendi

arXiv.org e-Print Archive

CiteSeerX

On the Catalyzing Effect of Randomness on the Per-Flow Throughput in Wireless Networks

Author: Ciucu Florin
Schmitt Jens
Publication venue
Publication date: 01/01/2013
Field of study

This paper investigates the throughput capacity of a flow crossing a multi-hop wireless network, whose geometry is characterized by general randomness laws including Uniform, Poisson, Heavy-Tailed distributions for both the nodes' densities and the number of hops. The key contribution is to demonstrate \textit{how} the \textit{per-flow throughput} depends on the distribution of 1) the number of nodes

N_j

inside hops' interference sets, 2) the number of hops

K

, and 3) the degree of spatial correlations. The randomness in both

N_j

's and

K

is advantageous, i.e., it can yield larger scalings (as large as

\Theta(n)

) than in non-random settings. An interesting consequence is that the per-flow capacity can exhibit the opposite behavior to the network capacity, which was shown to suffer from a logarithmic decrease in the presence of randomness. In turn, spatial correlations along the end-to-end path are detrimental by a logarithmic term

arXiv.org e-Print Archive

CiteSeerX

Warwick Research Archives Portal Repository

An algorithm for constrained one-step inversion of spectral CT data

Author: Barber Rina Foygel
Pan Xiaochuan
Schmidt Taly Gilat
Sidky Emil Y.
Publication venue: 'IOP Publishing'
Publication date: 10/11/2015
Field of study

We develop a primal-dual algorithm that allows for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. The algorithm allows for image constraints to be enforced on the basis maps during the inversion. The derivation of the algorithm makes use of a local upper bounding quadratic approximation to generate descent steps for non-convex spectral CT data discrepancy terms, combined with a new convex-concave optimization algorithm. Convergence of the algorithm is demonstrated on simulated spectral CT data. Simulations with noise and anthropomorphic phantoms show examples of how to employ the constrained one-step algorithm for spectral CT data.Comment: Submitted to Physics in Medicine and Biolog

arXiv.org e-Print Archive

epublications@Marquette

Revisiting Guerry's data: Introducing spatial constraints in multivariate analysis

Author: Dray Stéphane
Jombart Thibaut
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

Standard multivariate analysis methods aim to identify and summarize the main structures in large data sets containing the description of a number of observations by several variables. In many cases, spatial information is also available for each observation, so that a map can be associated to the multivariate data set. Two main objectives are relevant in the analysis of spatial multivariate data: summarizing covariation structures and identifying spatial patterns. In practice, achieving both goals simultaneously is a statistical challenge, and a range of methods have been developed that offer trade-offs between these two objectives. In an applied context, this methodological question has been and remains a major issue in community ecology, where species assemblages (i.e., covariation between species abundances) are often driven by spatial processes (and thus exhibit spatial patterns). In this paper we review a variety of methods developed in community ecology to investigate multivariate spatial patterns. We present different ways of incorporating spatial constraints in multivariate analysis and illustrate these different approaches using the famous data set on moral statistics in France published by Andr\'{e}-Michel Guerry in 1833. We discuss and compare the properties of these different approaches both from a practical and theoretical viewpoint.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS356 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Recommended from our members

Validating Variational Bayes Linear Regression Method With Multi-Central Datasets.

Author: Asaoka Ryo
Fujino Yuri
Hirasawa Kazunori
Kashiwagi Kenji
Matsuura Masato
Miki Atsuya
Mizoue Shiro
Mori Kazuhiko
Murata Hiroshi
Shoji Nobuyuki
Suzuki Katsuyoshi
Tanito Masaki
Yamashita Takehiro
Zangwill Linda M
Publication venue: eScholarship, University of California
Publication date: 01/04/2018
Field of study

PurposeTo validate the prediction accuracy of variational Bayes linear regression (VBLR) with two datasets external to the training dataset.MethodThe training dataset consisted of 7268 eyes of 4278 subjects from the University of Tokyo Hospital. The Japanese Archive of Multicentral Databases in Glaucoma (JAMDIG) dataset consisted of 271 eyes of 177 patients, and the Diagnostic Innovations in Glaucoma Study (DIGS) dataset includes 248 eyes of 173 patients, which were used for validation. Prediction accuracy was compared between the VBLR and ordinary least squared linear regression (OLSLR). First, OLSLR and VBLR were carried out using total deviation (TD) values at each of the 52 test points from the second to fourth visual fields (VFs) (VF2-4) to 2nd to 10th VF (VF2-10) of each patient in JAMDIG and DIGS datasets, and the TD values of the 11th VF test were predicted every time. The predictive accuracy of each method was compared through the root mean squared error (RMSE) statistic.ResultsOLSLR RMSEs with the JAMDIG and DIGS datasets were between 31 and 4.3 dB, and between 19.5 and 3.9 dB. On the other hand, VBLR RMSEs with JAMDIG and DIGS datasets were between 5.0 and 3.7, and between 4.6 and 3.6 dB. There was statistically significant difference between VBLR and OLSLR for both datasets at every series (VF2-4 to VF2-10) (P < 0.01 for all tests). However, there was no statistically significant difference in VBLR RMSEs between JAMDIG and DIGS datasets at any series of VFs (VF2-2 to VF2-10) (P > 0.05).ConclusionsVBLR outperformed OLSLR to predict future VF progression, and the VBLR has a potential to be a helpful tool at clinical settings

eScholarship - University of California