Search CORE

55 research outputs found

Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of normal means

Author: Brown Lawrence D.
Greenshtein Eitan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

We consider the classical problem of estimating a vector \bolds{\mu}=(\mu_1,...,\mu_n) based on independent observations

Y_i\sim N(\mu_i,1)

i=1,...,n

. Suppose

\mu_i

i=1,...,n

are independent realizations from a completely unknown

G

. We suggest an easily computed estimator \hat{\bolds{\mu}}, such that the ratio of its risk E(\hat{\bolds{\mu}}-\bolds{\mu})^2 with that of the Bayes procedure approaches 1. A related compound decision result is also obtained. Our asymptotics is of a triangular array; that is, we allow the distribution

G

to depend on

n

. Thus, our theoretical asymptotic results are also meaningful in situations where the vector \bolds{\mu} is sparse and the proportion of zero coordinates approaches 1. We demonstrate the performance of our estimator in simulations, emphasizing sparse setups. In ``moderately-sparse'' situations, our procedure performs very well compared to known procedures tailored for sparse setups. It also adapts well to nonsparse situations.Comment: Published in at http://dx.doi.org/10.1214/08-AOS630 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

ScholarlyCommons@Penn

Re-calibration of sample means

Author: Greenshtein E.
Ritov Y.
Publication venue
Publication date: 01/01/2011
Field of study

We consider the problem of calibration and the GREG method as suggested and studied in Deville and Sarndal (1992). We show that a GREG type estimator is typically not minimal variance unbiased estimator even asymptotically. We suggest a similar estimator which is unbiased but is asymptotically with a minimal variance

arXiv.org e-Print Archive

CiteSeerX

Two-Sided Sequential Tests

Author: Brown Lawrence D
Greenshtein Eitan
Publication venue: ScholarlyCommons
Publication date: 01/01/1992
Field of study

Let Xi be i.i.d. Xi∼Fθ. For some parametric families {Fθ}, we describe a monotonicity property of Bayes sequential procedures for the decision problem H0:θ = 0 versus H1:θ ≠ 0. A surprising counterexample is given in the case where Fθ is N(θ,1)

ScholarlyCommons@Penn

Active site prediction using evolutionary and structural information

Author: Aloy
Alterovitz
Altschul
Apweiler
Bagley
Baker
Bartlett
Bate
Berna
Brady
Capra
Casari
Chandonia
Davis
Edgar
Elcock
Fei Sha
Felsenstein
Fetrow
Fischer
Frey
George
Greenshtein
Gutteridge
Hastie
Hedstrom
Hedstrom
Henikoff
Hoggart
Hosmer
Huang
Hubbard
Innis
Jack F. Kirsch
Kabsch
Kimmen Sjölander
Koh
Kraut
Krem
Landau
Landgraf
Laurie
Lichtarge
Lin
Mayrose
McGrath
Michael I. Jordan
Mihalek
Mooney
Murzin
Ondrechen
Ota
Panchenko
Pazos
Peters
Petrova
Polgar
Porter
Richardson
Sankararaman
Segal
Shevade
Sriram Sankararaman
Tibshirani
Tong
van de Geer
Vàrallyay
Youn
Zhao
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

Author: A. B. Juditsky
A. B. Tsybakov
A. B. Tsybakov
A. B. Tsybakov
A. B. Tsybakov
A. Dalalyan
A. Dalalyan
A. Dembo
A. Nemirovski
B. Efron
D. L. Donoho
D. Revuz
E. Candes
E. Greenshtein
E. L. Lehmann
F. Bunea
F. Bunea
F. Bunea
G. Leung
I. E. Frank
J. Kivinen
J. Obloj
J.-Y. Audibert
N. Cesa-Bianchi
N. Cesa-Bianchi
N. Cesa-Bianchi
N. Littlestone
O. Catoni
T. Zhang
T. Zhang
V. V. Petrov
V. Vovk
V. Vovk
Y. Yang
Y. Yang
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/03/2008
Field of study

We study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp PAC-Bayesian risk bounds for aggregates defined via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We then apply these results to derive sparsity oracle inequalities

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal-Diderot

Approximation of Functions of Few Variables in High Dimensions

Author: A. Cohen
C. Boor de
D. Donoho
E. Candès
E. Greenshtein
E. Novak
Guergana Petrova
J. Körner
J. Levesley
M. Belkin
M. Fredman
Przemyslaw Wojtaszczyk
R. Coifman
R. Todor
Ronald DeVore
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

Author: A Wagner
BS Yandell
CH Kao
CH Kao
CH Kao
d Carlborg
D Fourdrinier
Dabao Zhang
E Greenshtein
EI George
ES Lander
H Hastie
H Ishwaran
H Jeffreys
H Jeffreys
H Wang
J Fan
J Liu
JH Moore
JM Álvarez-Castro
KW Broman
LJ Leamy
M Bogdan
M Bogdan
M Zhang
M Zhang
M Żak
Martin T Wells
Min Zhang
N Yi
N Yi
N Yi
N Yi
P Huber
PJ Gaffney
R Sanjuán
RD Ball
RJ Tibshirani
RJA Little
RW Doerge
S Portnoy
S Xu
SD Tanksley
SM Williams
TJ Mitchell
W Bateson
W Shi
Y Eshed
YH Cui
ZB Zeng
ZB Zeng
ZB Zeng
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). Results We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large <it>p </it>small <it>n</it>" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. Conclusion The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central