Search CORE

24 research outputs found

Re-calibration of sample means

Author: Greenshtein E.
Ritov Y.
Publication venue
Publication date: 01/01/2011
Field of study

We consider the problem of calibration and the GREG method as suggested and studied in Deville and Sarndal (1992). We show that a GREG type estimator is typically not minimal variance unbiased estimator even asymptotically. We suggest a similar estimator which is unbiased but is asymptotically with a minimal variance

arXiv.org e-Print Archive

CiteSeerX

Approximation of Functions of Few Variables in High Dimensions

Author: A. Cohen
C. Boor de
D. Donoho
E. Candès
E. Greenshtein
E. Novak
Guergana Petrova
J. Körner
J. Levesley
M. Belkin
M. Fredman
Przemyslaw Wojtaszczyk
R. Coifman
R. Todor
Ronald DeVore
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

Author: A. B. Juditsky
A. B. Tsybakov
A. B. Tsybakov
A. B. Tsybakov
A. B. Tsybakov
A. Dalalyan
A. Dalalyan
A. Dembo
A. Nemirovski
B. Efron
D. L. Donoho
D. Revuz
E. Candes
E. Greenshtein
E. L. Lehmann
F. Bunea
F. Bunea
F. Bunea
G. Leung
I. E. Frank
J. Kivinen
J. Obloj
J.-Y. Audibert
N. Cesa-Bianchi
N. Cesa-Bianchi
N. Cesa-Bianchi
N. Littlestone
O. Catoni
T. Zhang
T. Zhang
V. V. Petrov
V. Vovk
V. Vovk
Y. Yang
Y. Yang
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/03/2008
Field of study

We study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp PAC-Bayesian risk bounds for aggregates defined via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We then apply these results to derive sparsity oracle inequalities

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal-Diderot

L1pred: A Sequence-Based Prediction Tool for Catalytic Residues in Enzymes with the L1-logreg Classifier

Author: A Armon
A del Sol Mesa
A Gutteridge
AR Panchenko
B Sterner
C Berezin
C Marino Buslje
C Porter
CA Innis
Chi Zhang
D La
DR Caffrey
E Chea
E Cilia
E Greenshtein
E Youn
F Glaser
G Lopez
GJ Bartlett
HM Berman
I Mayrose
I Mihalek
IA Vergara
Iddo Friedberg
J Capra
J Pei
JD Fischer
Jialiang Yang
Jun Wang
K Koh
K Wang
K Ye
KC Bahadur Dukka
L Mirny
LJ McGuffin
M Brylinski
M Landau
N Petrova
P Zhao
R Alterovitz
RM Sweet
RM Williamson
S Ahmad
S Gong
S Pande
S Sankararaman
S Sankararaman
SA van de Geer
SF Altschul
SW Zhang
T Kato
T Zhang
W Taylor
W Tong
W Valdar
XS Liu
YC Dou
YC Dou
YC Dou
Yongchao Dou
YR Tang
ZP Liu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance

CiteSeerX

Public Library of Science (PLOS)

Crossref

DigitalCommons@University of Nebraska

Directory of Open Access Journals

PubMed Central

On Two Continuum Armed Bandit Problems in High Dimensions

Author: A Nilli
A Orlitsky
B Laurent
B Recht
Bernd Gärtner
E Cope
E Greenshtein
H Weyl
Hemant Tyagi
J Körner
J Tropp
JY Audibert
M Belkin
M Fornasier
M Fredman
M Fredman
P Auer
P Auer
P Wedin
R Agrawal
R Coifman
R DeVore
S Bubeck
Sebastian U. Stich
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Edinburgh Research Explorer

Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

Author: A Wagner
BS Yandell
CH Kao
CH Kao
CH Kao
d Carlborg
D Fourdrinier
Dabao Zhang
E Greenshtein
EI George
ES Lander
H Hastie
H Ishwaran
H Jeffreys
H Jeffreys
H Wang
J Fan
J Liu
JH Moore
JM Álvarez-Castro
KW Broman
LJ Leamy
M Bogdan
M Bogdan
M Zhang
M Zhang
M Żak
Martin T Wells
Min Zhang
N Yi
N Yi
N Yi
N Yi
P Huber
PJ Gaffney
R Sanjuán
RD Ball
RJ Tibshirani
RJA Little
RW Doerge
S Portnoy
S Xu
SD Tanksley
SM Williams
TJ Mitchell
W Bateson
W Shi
Y Eshed
YH Cui
ZB Zeng
ZB Zeng
ZB Zeng
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). Results We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large <it>p </it>small <it>n</it>" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. Conclusion The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Poisson Compound Decision Problem Revisited

Author: E. Greenshtein
L. Brown
Y. Ritov
Publication venue
Publication date: 01/01/2013
Field of study

Abstract: The compound decision problem for a vector of independent Poisson random variables with possibly different means has half a century old solution. However, it appears that the classical solution needs smoothing adjustment even when there are many observations and relatively small means such that the empirical distribution is close to its mean. We discuss three such adjustments. We also present another approach that first transforms the problem into the normal compound decision problem. 1

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

SURE estimates for high dimensional classification

Author: Gordon G. J.
Greenshtein E.
Li Q.
Tibshirani R.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

The Poisson Compound Decision Problem Revisited

Author: Brown L. D.
Brown L. D.
Brown L. D.
Clevenson L.
Cohen N.
Copas J. B.
Efron B.
Eitan Greenshtein
Fourdrinier D.
Greenshtein E.
Greenshtein E.
Johnstone I.
Johnstone I.
Kiefer J.
Lawrence D. Brown
Mammen E.
Maritz J. S.
Park J.
R Development Core Team
Robbins H.
Robbins H.
Robbins H.
Robertson T.
Wenhua J.
Wenhua J.
Ya’acov Ritov
Zhang C.-H.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref