28 research outputs found
A general asymptotic scheme for inference under order restrictions
Limit distributions for the greatest convex minorant and its derivative are
considered for a general class of stochastic processes including partial sum
processes and empirical processes, for independent, weakly dependent and long
range dependent data. The results are applied to isotonic regression, isotonic
regression after kernel smoothing, estimation of convex regression functions,
and estimation of monotone and convex density functions. Various pointwise
limit distributions are obtained, and the rate of convergence depends on the
self similarity properties and on the rate of convergence of the processes
considered.Comment: Published at http://dx.doi.org/10.1214/009053606000000443 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Generalizing univariate signed rank statistics for testing and estimating a multivariate location parameter.
We generalize signed rank statistics to dimensions higher than one. This results in a class of orthogonally invariant and distribution free tests that can be used for testing spherical symmetry/location parameter. The corresponding estimator is orthogonally equivariant. Both the test and estimator can be chosen with asymptotic efficiency 1. The breakdown point of the estimator depends only on the scores, not on the dimension of the data. For elliptical distributions, we obtain an affine invariant test with the same asymptotic properties, if the signed rank statistic is applied to standardized data. We also present a method for computing the estimator numerically, and consider a real data example and some simulations. Finally, an application to detection of time-varying signals in spherically symmetric noise is given.Affine invariant tests; Asymptotic normality; Breakdown point; distribution free tests;
Generalized S-estimators.
In this paper we introduce a new type of positive-breakdown regression method, called a generalized S-estimator (or GS-estimator), based on the minimization of a generalized M-estimator of residual scale. We compare the class of GS-estimators with the usual S-estimators, including least median of squares. It turns out that GS-estimators attain a much higher efficiency than S-estimators, at the cost of a slightly increased worst-case bias. We investigate the breakdown point, the maxbias curve and the influence function of GS-estimators. We also give an algorithm for computing GS-estimators, and apply it to real and simulated data.Breakdown point; Influence function; Maxbias curve; Regression analysis; Robustness;
A Fast Algorithm for Robust Regression with Penalised Trimmed Squares
The presence of groups containing high leverage outliers makes linear
regression a difficult problem due to the masking effect. The available high
breakdown estimators based on Least Trimmed Squares often do not succeed in
detecting masked high leverage outliers in finite samples.
An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS)
estimator, was introduced by the authors in \cite{ZiouAv:05,ZiAvPi:07} and it
appears to be less sensitive to the masking problem. This estimator is defined
by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective
function a penalty cost for each observation is included which serves as an
upper bound on the residual error for any feasible regression line. Since the
PTS does not require presetting the number of outliers to delete from the data
set, it has better efficiency with respect to other estimators. However, due to
the high computational complexity of the resulting QMIP problem, exact
solutions for moderately large regression problems is infeasible.
In this paper we further establish the theoretical properties of the PTS
estimator, such as high breakdown and efficiency, and propose an approximate
algorithm called Fast-PTS to compute the PTS estimator for large data sets
efficiently. Extensive computational experiments on sets of benchmark instances
with varying degrees of outlier contamination, indicate that the proposed
algorithm performs well in identifying groups of high leverage outliers in
reasonable computational time.Comment: 27 page
Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments
<p>Abstract</p> <p>Background</p> <p>Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important.</p> <p>Results</p> <p>We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis.</p> <p>Conclusion</p> <p>We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.</p
Temporal Dynamics of Host Molecular Responses Differentiate Symptomatic and Asymptomatic Influenza A Infection
Exposure to influenza viruses is necessary, but not sufficient, for healthy human hosts to develop symptomatic illness. The host response is an important determinant of disease progression. In order to delineate host molecular responses that differentiate symptomatic and asymptomatic Influenza A infection, we inoculated 17 healthy adults with live influenza (H3N2/Wisconsin) and examined changes in host peripheral blood gene expression at 16 timepoints over 132 hours. Here we present distinct transcriptional dynamics of host responses unique to asymptomatic and symptomatic infections. We show that symptomatic hosts invoke, simultaneously, multiple pattern recognition receptors-mediated antiviral and inflammatory responses that may relate to virus-induced oxidative stress. In contrast, asymptomatic subjects tightly regulate these responses and exhibit elevated expression of genes that function in antioxidant responses and cell-mediated responses. We reveal an ab initio molecular signature that strongly correlates to symptomatic clinical disease and biomarkers whose expression patterns best discriminate early from late phases of infection. Our results establish a temporal pattern of host molecular responses that differentiates symptomatic from asymptomatic infections and reveals an asymptomatic host-unique non-passive response signature, suggesting novel putative molecular targets for both prognostic assessment and ameliorative therapeutic intervention in seasonal and pandemic influenza
From basic to reduced bias kernel density estimators: links via Taylor series approximations
The transformation kernel density estimator of Ruppert and Cline (1994) achieves bias of order h4 (as the bandwidth h→0), an improvement over the order h2 bias associated with the basic kernel density estimator. Hössjer and Ruppert (1994) use Taylor series expansions to build a bridge between the two, displaying an infinite sequence of O(h4) bias estimators in the process. In this paper, we extend the work of Hössjer and Ruppert (i) by investigating three other natural Taylor series expansions, and (ii) by applying the approach to two other O(h4) bias estimators, namely the variable bandwidth and multiplicative bias correction methods. Several further infinite sequences of O(h4) bias estimators result
Generalizing univariate signed rank statistics for testing and estimating a multivariate location parameter
We generalize signed rank statistics to dimensions higher than one. This results in a class of orthogonally invariant and distribution free tests that can be used for testing spherical symmetry/location parameter. The corresponding estimator is orthogonally equivariant. Both the test and estimator can be chosen with asymptotic efficiency 1. The breakdown point of the estimator depends only on the scores, not on the dimension of the data. For elliptical distributions, we obtain an affine invariant test with the same asymptotic properties, if the signed rank statistic is
applied to standardized data. We also present a method for computing the estimator numerically, and consider a real data example and some simulations. Finally, an application to detection of time-varying signals in spherically symmetric noise is given.status: publishe
On the effect of estimating the error density in nonparametric deconvolution
It is quite common in the statistical literature on nonparametric deconvolution to assume that the error density is perfectly known. Since this seems to be unrealistic in many practical applications, we study the effect of estimating the unknown error density. We derive minimax rates of convergence and propose a modification of the usual kernel-based estimation scheme, which takes the uncertainty about the error density into account. A simulation study quantifies the possible gains by this new method in finite sample situations