96 research outputs found

    Remembering Leo Breiman

    Full text link
    I published an interview of Leo Breiman in Statistical Science [Olshen (2001)], and also the solution to a problem concerning almost sure convergence of binary tree-structured estimators in regression [Olshen (2007)]. The former summarized much of my thinking about Leo up to five years before his death. I discussed the latter with Leo and dedicated that paper to his memory. Therefore, this note is on other topics. In preparing it I am reminded how much I miss this man of so many talents and interests. I miss him not because I always agreed with him, but instead because his comments about statistics in particular and life in general always elicited my substantial reflection.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS385 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Successive Standardization of Rectangular Arrays

    Full text link
    In this note we illustrate and develop further with mathematics and examples, the work on successive standardization (or normalization) that is studied earlier by the same authors in Olshen and Rajaratnam (2010) and Olshen and Rajaratnam (2011). Thus, we deal with successive iterations applied to rectangular arrays of numbers, where to avoid technical difficulties an array has at least three rows and at least three columns. Without loss, an iteration begins with operations on columns: first subtract the mean of each column; then divide by its standard deviation. The iteration continues with the same two operations done successively for rows. These four operations applied in sequence completes one iteration. One then iterates again, and again, and again,.... In Olshen and Rajaratnam (2010) it was argued that if arrays are made up of real numbers, then the set for which convergence of these successive iterations fails has Lebesgue measure 0. The limiting array has row and column means 0, row and column standard deviations 1. A basic result on convergence given in Olshen and Rajaratnam (2010) is true, though the argument in Olshen and Rajaratnam (2010) is faulty. The result is stated in the form of a theorem here, and the argument for the theorem is correct. Moreover, many graphics given in Olshen and Rajaratnam (2010) suggest that but for a set of entries of any array with Lebesgue measure 0, convergence is very rapid, eventually exponentially fast in the number of iterations. Because we learned this set of rules from Bradley Efron, we call it "Efron's algorithm". More importantly, the rapidity of convergence is illustrated by numerical examples

    Successive normalization of rectangular arrays

    Full text link
    Standard statistical techniques often require transforming data to have mean 00 and standard deviation 11. Typically, this process of "standardization" or "normalization" is applied across subjects when each subject produces a single number. High throughput genomic and financial data often come as rectangular arrays where each coordinate in one direction concerns subjects who might have different status (case or control, say), and each coordinate in the other designates "outcome" for a specific feature, for example, "gene," "polymorphic site" or some aspect of financial profile. It may happen, when analyzing data that arrive as a rectangular array, that one requires BOTH the subjects and the features to be "on the same footing." Thus there may be a need to standardize across rows and columns of the rectangular matrix. There arises the question as to how to achieve this double normalization. We propose and investigate the convergence of what seems to us a natural approach to successive normalization which we learned from our colleague Bradley Efron. We also study the implementation of the method on simulated data and also on data that arose from scientific experimentation.Comment: Published in at http://dx.doi.org/10.1214/09-AOS743 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). With Correction

    Almost surely consistent nonparametric regression from recursive partitioning schemes

    Get PDF
    AbstractPresented here are results on almost sure convergence of estimators of regression functions subject to certain moment restrictions. Two somewhat different notions of almost sure convergence are studied: unconditional and conditional given a training sample. The estimators are local means derived from certain recursive partitioning schemes

    Canadian Multidisciplinary Core Curriculum for Musculoskeletal Health

    Get PDF
    ABSTRACT. Objective. To determine the level of agreement among the Bone and Joint Decade Undergraduate Curriculum Group (BJDUCG) core curriculum recommendations for musculoskeletal (MSK) conditions targeted for undergraduate medical education and what the physicians and surgeons of Canada thought to be important at the postgraduate level of education. Methods. An 80-item questionnaire was developed. A cross-sectional survey of educators representing 77 Canadian accredited academic programs representing 6 disciplines in medicine that manage patients with MSK conditions was completed. Histograms, Kruskal-Wallis, and principal component analyses were computed. Results. In total, 164/175 (94%) respondents participated in the study. All 80 curriculum items received a mean score of at least 3.0/4.0. Sixty-four out of 80 items were ranked to be at least 3.5/4.0, and 35 items were ranked to be at least 3.8/4.0, suggesting that these items may be core content for all disciplines. Conclusion. The World Health Organization declared the years 2000 to 2010 as The Bone and Joint Decade. The main goal is to improve the quality of life for people with MSK disorders worldwide. One aim of the BJD is to increase education of healthcare providers at all levels. The BJDUCG established a set of core curriculum recommendations for MSK conditions. Our study gives reliable statistical evidence of agreement among what the BJDUCG recommended for an MSK core curriculum for medical schools and what the physicians and surgeons of Canada thought to be important for residency education in several disciplines

    Genetic Analysis of the Early Natural History of Epithelial Ovarian Carcinoma

    Get PDF
    The high mortality rate associated with epithelial ovarian carcinoma (EOC) reflects diagnosis commonly at an advanced stage, but improved early detection is hindered by uncertainty as to the histologic origin and early natural history of this malignancy.Here we report combined molecular genetic and morphologic analyses of normal human ovarian tissues and early stage cancers, from both BRCA mutation carriers and the general population, indicating that EOCs frequently arise from dysplastic precursor lesions within epithelial inclusion cysts. In pathologically normal ovaries, molecular evidence of oncogenic stress was observed specifically within epithelial inclusion cysts. To further explore potential very early events in ovarian tumorigenesis, ovarian tissues from women not known to be at high risk for ovarian cancer were subjected to laser catapult microdissection and gene expression profiling. These studies revealed a quasi-neoplastic expression signature in benign ovarian cystic inclusion epithelium compared to surface epithelium, specifically with respect to genes affecting signal transduction, cell cycle control, and mitotic spindle formation. Consistent with this gene expression profile, a significantly higher cell proliferation index (increased cell proliferation and decreased apoptosis) was observed in histopathologically normal ovarian cystic compared to surface epithelium. Furthermore, aneuploidy was frequently identified in normal ovarian cystic epithelium but not in surface epithelium.Together, these data indicate that EOC frequently arises in ovarian cystic inclusions, is preceded by an identifiable dysplastic precursor lesion, and that increased cell proliferation, decreased apoptosis, and aneuploidy are likely to represent very early aberrations in ovarian tumorigenesis

    Five Blood Pressure Loci Identified by an Updated Genome-Wide Linkage Scan: Meta-Analysis of the Family Blood Pressure Program

    Get PDF
    Background A preliminary genome-wide linkage analysis of blood pressure in the Family Blood Pressure Program (FBPP) was reported previously. We harnessed the power and ethnic diversity of the final pooled FBPP dataset to identify novel loci for blood pressure thereby enhancing localization of genes containing less common variants with large effects on blood pressure levels and hypertension. Methods We performed one overall and 4 race-specific meta-analyses of genome-wide blood pressure linkage scans using data on 4,226African-American, 2,154 Asian, 4,229 Caucasian, and 2,435 Mexican- American participants (total N = 13,044). Variance components models were fit to measured (raw) blood pressure levels and two types of antihypertensive medication adjusted blood pressure phenotypes within each of 10 subgroups defined by race and network. A modified Fisher's method was used to combine the P values for each linkage marker across the 10 subgroups. Results Five quantitative trait loci (QTLs) were detected on chromosomes 6p22.3, 8q23.1, 20q13.12, 21q21.1, and 21q21.3 based on significant linkage evidence (defined by logarithm of odds (lod) score β‰₯3) in at least one meta-analysis and lod scores β‰₯1 in at least 2 subgroups defined by network and race. The chromosome 8q23.1 locus was supported by Asian-, Caucasian-, and Mexican-American-specific meta-analyses. Conclusions The new QTLs reported justify new candidate gene studies. They may help support results from genome-wide association studies (GWAS) that fall in these QTL regions but fail to achieve the genome-wide significance. American Journal of Hypertension advance online publication 9 December 2010;doi:10.1038/ajh.2010.23

    Integrated Genomic Analysis Implicates Haploinsufficiency of Multiple Chromosome 5q31.2 Genes in De Novo Myelodysplastic Syndromes Pathogenesis

    Get PDF
    Deletions spanning chromosome 5q31.2 are among the most common recurring cytogenetic abnormalities detectable in myelodysplastic syndromes (MDS). Prior genomic studies have suggested that haploinsufficiency of multiple 5q31.2 genes may contribute to MDS pathogenesis. However, this hypothesis has never been formally tested. Therefore, we designed this study to systematically and comprehensively evaluate all 28 chromosome 5q31.2 genes and directly test whether haploinsufficiency of a single 5q31.2 gene may result from a heterozygous nucleotide mutation or microdeletion. We selected paired tumor (bone marrow) and germline (skin) DNA samples from 46 de novo MDS patients (37 without a cytogenetic 5q31.2 deletion) and performed total exonic gene resequencing (479 amplicons) and array comparative genomic hybridization (CGH). We found no somatic nucleotide changes in the 46 MDS samples, and no cytogenetically silent 5q31.2 deletions in 20/20 samples analyzed by array CGH. Twelve novel single nucleotide polymorphisms were discovered. The mRNA levels of 7 genes in the commonly deleted interval were reduced by 50% in CD34+ cells from del(5q) MDS samples, and no gene showed complete loss of expression. Taken together, these data show that small deletions and/or point mutations in individual 5q31.2 genes are not common events in MDS, and implicate haploinsufficiency of multiple genes as the relevant genetic consequence of this common deletion
    • …
    corecore