Search CORE

1,151 research outputs found

Remembering Leo Breiman

Author: Olshen Richard A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 05/01/2011
Field of study

I published an interview of Leo Breiman in Statistical Science [Olshen (2001)], and also the solution to a problem concerning almost sure convergence of binary tree-structured estimators in regression [Olshen (2007)]. The former summarized much of my thinking about Leo up to five years before his death. I discussed the latter with Leo and dedicated that paper to his memory. Therefore, this note is on other topics. In preparing it I am reminded how much I miss this man of so many talents and interests. I miss him not because I always agreed with him, but instead because his comments about statistics in particular and life in general always elicited my substantial reflection.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS385 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Successive Standardization of Rectangular Arrays

Author: Olshen Richard A.
Rajaratnam Bala
Publication venue
Publication date: 01/02/2012
Field of study

In this note we illustrate and develop further with mathematics and examples, the work on successive standardization (or normalization) that is studied earlier by the same authors in Olshen and Rajaratnam (2010) and Olshen and Rajaratnam (2011). Thus, we deal with successive iterations applied to rectangular arrays of numbers, where to avoid technical difficulties an array has at least three rows and at least three columns. Without loss, an iteration begins with operations on columns: first subtract the mean of each column; then divide by its standard deviation. The iteration continues with the same two operations done successively for rows. These four operations applied in sequence completes one iteration. One then iterates again, and again, and again,.... In Olshen and Rajaratnam (2010) it was argued that if arrays are made up of real numbers, then the set for which convergence of these successive iterations fails has Lebesgue measure 0. The limiting array has row and column means 0, row and column standard deviations 1. A basic result on convergence given in Olshen and Rajaratnam (2010) is true, though the argument in Olshen and Rajaratnam (2010) is faulty. The result is stated in the form of a theorem here, and the argument for the theorem is correct. Moreover, many graphics given in Olshen and Rajaratnam (2010) suggest that but for a set of entries of any array with Lebesgue measure 0, convergence is very rapid, eventually exponentially fast in the number of iterations. Because we learned this set of rules from Bradley Efron, we call it "Efron's algorithm". More importantly, the rapidity of convergence is illustrated by numerical examples

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

PubMed Central

Successive normalization of rectangular arrays

Author: Olshen Richard A.
Rajaratnam Bala
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 11/12/2013
Field of study

Standard statistical techniques often require transforming data to have mean

0

and standard deviation

1

. Typically, this process of "standardization" or "normalization" is applied across subjects when each subject produces a single number. High throughput genomic and financial data often come as rectangular arrays where each coordinate in one direction concerns subjects who might have different status (case or control, say), and each coordinate in the other designates "outcome" for a specific feature, for example, "gene," "polymorphic site" or some aspect of financial profile. It may happen, when analyzing data that arrive as a rectangular array, that one requires BOTH the subjects and the features to be "on the same footing." Thus there may be a need to standardize across rows and columns of the rectangular matrix. There arises the question as to how to achieve this double normalization. We propose and investigate the convergence of what seems to us a natural approach to successive normalization which we learned from our colleague Bradley Efron. We also study the implementation of the method on simulated data and also on data that arose from scientific experimentation.Comment: Published in at http://dx.doi.org/10.1214/09-AOS743 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). With Correction

arXiv.org e-Print Archive

Crossref

A Generalized Unimodality

Author: Olshen R. A.
Savage L. J.
Publication venue
Publication date
Field of study

Generalization of unimodality for random objects taking values in finite dimensional vector spac

NASA Technical Reports Server

An analysis of educational philosophies underlying the teaching of contemporary affairs in secondary schools

Author: Klickstein Natalie Olshen
Publication venue: Boston University
Publication date: 01/01/1952
Field of study

Thesis (Ed.M.)--Boston Universit

Boston University Institutional Repository (OpenBU)

A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data

Author: Olshen Adam
Venkatraman E S
Publication venue: Collection of Biostatistics Research Archive
Publication date: 07/06/2006
Field of study

Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal

t

-statistic with a permutation reference distribution to obtain the corresponding

p

-value. The number of computations required for the maximal test statistic is

O(N^2),

where

N

is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the

p

-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy\u27\u27 package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the

p

-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher

Collection Of Biostatistics Research Archive

Almost surely consistent nonparametric regression from recursive partitioning schemes

Author: Gordon Louis
Olshen Richard A
Publication venue: Published by Elsevier Inc.
Publication date: 31/10/1984
Field of study

AbstractPresented here are results on almost sure convergence of estimators of regression functions subject to certain moment restrictions. Two somewhat different notions of almost sure convergence are studied: unconditional and conditional given a training sample. The estimators are local means derived from certain recursive partitioning schemes

Elsevier - Publisher Connector

On the maximal halfspace depth of permutation-invariant distributions on the simplex

Author: Bélisle
Davy Paindaveine
Donoho
Eaton
Germain Van Bever
Hájek
Lawton
Marshall
Olshen
Rousseeuw
Rousseeuw
Publication venue
Publication date: 01/01/2017
Field of study

We compute the maximal halfspace depth for a class of permutation-invariant distributions on the probability simplex. The derivations are based on stochastic ordering results that so far were only showed to be relevant for the Behrens-Fisher problem.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

Crossref

DI-fusion