54 research outputs found
Detection and localization of change-points in high-dimensional network traffic data
We propose a novel and efficient method, that we shall call TopRank in the
following paper, for detecting change-points in high-dimensional data. This
issue is of growing concern to the network security community since network
anomalies such as Denial of Service (DoS) attacks lead to changes in Internet
traffic. Our method consists of a data reduction stage based on record
filtering, followed by a nonparametric change-point detection test based on
-statistics. Using this approach, we can address massive data streams and
perform anomaly detection and localization on the fly. We show how it applies
to some real Internet traffic provided by France-T\'el\'ecom (a French Internet
service provider) in the framework of the ANR-RNRT OSCAR project. This approach
is very attractive since it benefits from a low computational load and is able
to detect and localize several types of network anomalies. We also assess the
performance of the TopRank algorithm using synthetic data and compare it with
alternative approaches based on random aggregation.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS232 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
OMP-type Algorithm with Structured Sparsity Patterns for Multipath Radar Signals
A transmitted, unknown radar signal is observed at the receiver through more
than one path in additive noise. The aim is to recover the waveform of the
intercepted signal and to simultaneously estimate the direction of arrival
(DOA). We propose an approach exploiting the parsimonious time-frequency
representation of the signal by applying a new OMP-type algorithm for
structured sparsity patterns. An important issue is the scalability of the
proposed algorithm since high-dimensional models shall be used for radar
signals. Monte-Carlo simulations for modulated signals illustrate the good
performance of the method even for low signal-to-noise ratios and a gain of 20
dB for the DOA estimation compared to some elementary method
Adaptive tests for periodic signal detection with applications to laser vibrometry
International audienceInitially motivated by a practical issue in target detection via laser vibrometry, we are interested in the problem of periodic signal detection in a Gaussian fixed design regression framework. Assuming that the signal belongs to some periodic Sobolev ball and that the variance of the noise is known, we first consider the problem from a minimax point of view: we evaluate the so-called minimax separation rate which corresponds to the minimal l2âdistance between the signal and zero so that the detection is possible with prescribed probabilities of error. Then, we propose a testing procedure which is available when the variance of the noise is unknown and which does not use any prior information about the smoothness degree or the period of the signal. We prove that it is adaptive in the sense that it achieves, up to a possible logarithmic factor, the minimax separation rate over various periodic Sobolev balls simultaneously. The originality of our approach as compared to related works on the topic of signal detection is that our testing procedure is sensitive to the periodicity assumption on the signal. A simulation study is performed in order to evaluate the effect of this prior assumption on the power of the test. We do observe the gains that we could expect from the theory. At last, we turn to the application to target detection by laser vibrometry that we had in view
A novel approach for estimating functions in the multivariate setting based on an adaptive knot selection for B-splines with an application to a chemical system used in geoscience
In this paper, we will outline a novel data-driven method for estimating
functions in a multivariate nonparametric regression model based on an adaptive
knot selection for B-splines. The underlying idea of our approach for selecting
knots is to apply the generalized lasso, since the knots of the B-spline basis
can be seen as changes in the derivatives of the function to be estimated. This
method was then extended to functions depending on several variables by
processing each dimension independently, thus reducing the problem to a
univariate setting. The regularization parameters were chosen by means of a
criterion based on EBIC. The nonparametric estimator was obtained using a
multivariate B-spline regression with the corresponding selected knots. Our
procedure was validated through numerical experiments by varying the number of
observations and the level of noise to investigate its robustness. The
influence of observation sampling was also assessed and our method was applied
to a chemical system commonly used in geoscience. For each different framework
considered in this paper, our approach performed better than state-of-the-art
methods. Our completely data-driven method is implemented in the glober R
package which will soon be available on the Comprehensive R Archive Network
(CRAN).Comment: 29 pages, 29 figure
A variable selection approach for highly correlated predictors in high-dimensional genomic data
In genomic studies, identifying biomarkers associated with a variable of
interest is a major concern in biomedical research. Regularized approaches are
classically used to perform variable selection in high-dimensional linear
models. However, these methods can fail in highly correlated settings. We
propose a novel variable selection approach called WLasso, taking these
correlations into account. It consists in rewriting the initial
high-dimensional linear model to remove the correlation between the biomarkers
(predictors) and in applying the generalized Lasso criterion. The performance
of WLasso is assessed using synthetic data in several scenarios and compared
with recent alternative approaches. The results show that when the biomarkers
are highly correlated, WLasso outperforms the other approaches in sparse
high-dimensional frameworks. The method is also successfully illustrated on
publicly available gene expression data in breast cancer. Our method is
implemented in the WLasso R package which is available from the Comprehensive R
Archive Network
- âŠ