Search CORE

706 research outputs found

Remembering Leo Breiman

Author: Olshen Richard A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 05/01/2011
Field of study

I published an interview of Leo Breiman in Statistical Science [Olshen (2001)], and also the solution to a problem concerning almost sure convergence of binary tree-structured estimators in regression [Olshen (2007)]. The former summarized much of my thinking about Leo up to five years before his death. I discussed the latter with Leo and dedicated that paper to his memory. Therefore, this note is on other topics. In preparing it I am reminded how much I miss this man of so many talents and interests. I miss him not because I always agreed with him, but instead because his comments about statistics in particular and life in general always elicited my substantial reflection.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS385 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Successive Standardization of Rectangular Arrays

Author: Olshen Richard A.
Rajaratnam Bala
Publication venue
Publication date: 01/02/2012
Field of study

In this note we illustrate and develop further with mathematics and examples, the work on successive standardization (or normalization) that is studied earlier by the same authors in Olshen and Rajaratnam (2010) and Olshen and Rajaratnam (2011). Thus, we deal with successive iterations applied to rectangular arrays of numbers, where to avoid technical difficulties an array has at least three rows and at least three columns. Without loss, an iteration begins with operations on columns: first subtract the mean of each column; then divide by its standard deviation. The iteration continues with the same two operations done successively for rows. These four operations applied in sequence completes one iteration. One then iterates again, and again, and again,.... In Olshen and Rajaratnam (2010) it was argued that if arrays are made up of real numbers, then the set for which convergence of these successive iterations fails has Lebesgue measure 0. The limiting array has row and column means 0, row and column standard deviations 1. A basic result on convergence given in Olshen and Rajaratnam (2010) is true, though the argument in Olshen and Rajaratnam (2010) is faulty. The result is stated in the form of a theorem here, and the argument for the theorem is correct. Moreover, many graphics given in Olshen and Rajaratnam (2010) suggest that but for a set of entries of any array with Lebesgue measure 0, convergence is very rapid, eventually exponentially fast in the number of iterations. Because we learned this set of rules from Bradley Efron, we call it "Efron's algorithm". More importantly, the rapidity of convergence is illustrated by numerical examples

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

PubMed Central

Tree-structured regression and the differentiation of integrals

Author: Olshen Richard A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/02/2007
Field of study

This paper provides answers to questions regarding the almost sure limiting behavior of rooted, binary tree-structured rules for regression. Examples show that questions raised by Gordon and Olshen in 1984 have negative answers. For these examples of regression functions and sequences of their associated binary tree-structured approximations, for all regression functions except those in a set of the first category, almost sure consistency fails dramatically on events of full probability. One consequence is that almost sure consistency of binary tree-structured rules such as CART requires conditions beyond requiring that (1) the regression function be in

{\mathcal {L}}^1

, (2) partitions of a Euclidean feature space be into polytopes with sides parallel to coordinate axes, (3) the mesh of the partitions becomes arbitrarily fine almost surely and (4) the empirical learning sample content of each polytope be ``large enough.'' The material in this paper includes the solution to a problem raised by Dudley in discussions. The main results have a corollary regarding the lack of almost sure consistency of certain Bayes-risk consistent rules for classification.Comment: Published at http://dx.doi.org/10.1214/009053606000001000 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

A Generalized Unimodality

Author: Olshen R. A.
Savage L. J.
Publication venue
Publication date
Field of study

Generalization of unimodality for random objects taking values in finite dimensional vector spac

NASA Technical Reports Server

Successive normalization of rectangular arrays

Author: Olshen Richard A.
Rajaratnam Bala
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 11/12/2013
Field of study

Standard statistical techniques often require transforming data to have mean

0

and standard deviation

1

. Typically, this process of "standardization" or "normalization" is applied across subjects when each subject produces a single number. High throughput genomic and financial data often come as rectangular arrays where each coordinate in one direction concerns subjects who might have different status (case or control, say), and each coordinate in the other designates "outcome" for a specific feature, for example, "gene," "polymorphic site" or some aspect of financial profile. It may happen, when analyzing data that arrive as a rectangular array, that one requires BOTH the subjects and the features to be "on the same footing." Thus there may be a need to standardize across rows and columns of the rectangular matrix. There arises the question as to how to achieve this double normalization. We propose and investigate the convergence of what seems to us a natural approach to successive normalization which we learned from our colleague Bradley Efron. We also study the implementation of the method on simulated data and also on data that arose from scientific experimentation.Comment: Published in at http://dx.doi.org/10.1214/09-AOS743 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). With Correction

arXiv.org e-Print Archive

Crossref

Recommended from our members

GENOME WIDE DNA METHYLATION PROFILING IS PREDICTIVE OF OUTCOME IN JUVENILE MYELOMONOCYTIC LEUKEMIA

Author: Costello J
Flotho C
Geng H
Lipka D
Loh M
Mazor T
Olshen A
Plass C
Stieglitz E
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

eScholarship - University of California

Zero- vs. one-dimensional, parametric vs. non-parametric, and confidence interval vs. hypothesis testing procedures in one-dimensional biomechanical trajectory analysis.

Author: Adler
Batterham
Besier
Bisseling
Cutti
Dixon
Dorn
Duhamel
Friston
Friston
Good
Gravel
Jos Vanrenterghem
Kiebel
Kristianslund
Lenhoff
Mark A. Robinson
McGinley
Neptune
Nichols
Olshen
Pataky
Pataky
Pataky
Ramsay
Robinson
Sadeghi
Schwartz
Todd C. Pataky
Worsley
Worsley
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories

Lirias

LJMU Research Online (Liverpool John Moores University)

Crossref

Shinshu University Institutional Repository

Repository of Shinshu

Five blood pressure loci identified by an updated genome-wide linkage scan: meta-analysis of the Family Blood Pressure Program.

Author: Boerwinkle E.
Chakravarti A.
Cooper R.S.
Curb J.D.
Ehret G.
Gu C.C.
Ho L.T.
Hunt S.C.
Jaquish C.
Kardia S.
Kume R.
Olshen R.A.
Paltoo D.
Province M.A.
Rao D.C.
Schwander K.
Shi G.
Simino J.
Turner S.T.
Weder A.
Zhu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

BACKGROUND: A preliminary genome-wide linkage analysis of blood pressure in the Family Blood Pressure Program (FBPP) was reported previously. We harnessed the power and ethnic diversity of the final pooled FBPP dataset to identify novel loci for blood pressure thereby enhancing localization of genes containing less common variants with large effects on blood pressure levels and hypertension. METHODS: We performed one overall and 4 race-specific meta-analyses of genome-wide blood pressure linkage scans using data on 4,226 African-American, 2,154 Asian, 4,229 Caucasian, and 2,435 Mexican-American participants (total N = 13,044). Variance components models were fit to measured (raw) blood pressure levels and two types of antihypertensive medication adjusted blood pressure phenotypes within each of 10 subgroups defined by race and network. A modified Fisher's method was used to combine the P values for each linkage marker across the 10 subgroups. RESULTS: Five quantitative trait loci (QTLs) were detected on chromosomes 6p22.3, 8q23.1, 20q13.12, 21q21.1, and 21q21.3 based on significant linkage evidence (defined by logarithm of odds (lod) score ≥3) in at least one meta-analysis and lod scores ≥1 in at least 2 subgroups defined by network and race. The chromosome 8q23.1 locus was supported by Asian-, Caucasian-, and Mexican-American-specific meta-analyses. CONCLUSIONS: The new QTLs reported justify new candidate gene studies. They may help support results from genome-wide association studies (GWAS) that fall in these QTL regions but fail to achieve the genome-wide significance

UNIL IRIS | Institutional Research Information System