28 research outputs found

    Remembering Leo Breiman

    Full text link
    Leo Breiman was a highly creative, influential researcher with a down-to-earth personal style and an insistence on working on important real world problems and producing useful solutions. This paper is a short review of Breiman's extensive contributions to the field of applied statistics.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS427 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As computational power improves, the application of more advanced machine learning techniques to the analysis of large genome-wide association (GWA) datasets becomes possible. While most traditional statistical methods can only elucidate main effects of genetic variants on risk for disease, certain machine learning approaches are particularly suited to discover higher order and non-linear effects. One such approach is the Random Forests (RF) algorithm. The use of RF for SNP discovery related to human disease has grown in recent years; however, most work has focused on small datasets or simulation studies which are limited.</p> <p>Results</p> <p>Using a multiple sclerosis (MS) case-control dataset comprised of 300 K SNP genotypes across the genome, we outline an approach and some considerations for optimally tuning the RF algorithm based on the empirical dataset. Importantly, results show that typical default parameter values are not appropriate for large GWA datasets. Furthermore, gains can be made by sub-sampling the data, pruning based on linkage disequilibrium (LD), and removing strong effects from RF analyses. The new RF results are compared to findings from the original MS GWA study and demonstrate overlap. In addition, four new interesting candidate MS genes are identified, <it>MPHOSPH9, CTNNA3, PHACTR2 </it>and <it>IL7</it>, by RF analysis and warrant further follow-up in independent studies.</p> <p>Conclusions</p> <p>This study presents one of the first illustrations of successfully analyzing GWA data with a machine learning algorithm. It is shown that RF is computationally feasible for GWA data and the results obtained make biologic sense based on previous studies. More importantly, new genes were identified as potentially being associated with MS, suggesting new avenues of investigation for this complex disease.</p

    Human Bone Marrow Organoids for Disease Modeling, Discovery, and Validation of Therapeutic Targets in Hematologic Malignancies

    Get PDF
    A lack of models that recapitulate the complexity of human bone marrow has hampered mechanistic studies of normal and malignant hematopoiesis and the validation of novel therapies. Here, we describe a step-wise, directed-differentiation protocol in which organoids are generated from induced pluripotent stem cells committed to mesenchymal, endothelial, and hematopoietic lineages. These 3D structures capture key features of human bone marrow— stroma, lumen-forming sinusoids, and myeloid cells including proplatelet-forming megakaryocytes. The organoids supported the engraftment and survival of cells from patients with blood malignancies, including cancer types notoriously difficult to maintain ex vivo. Fibrosis of the organoid occurred following TGFβ stimulation and engraftment with myelofibrosis but not healthy donor–derived cells, validating this platform as a powerful tool for studies of malignant cells and their interactions within a human bone marrow–like milieu. This enabling technology is likely to accelerate the discovery and prioritization of novel targets for bone marrow disorders and blood cancers. SIGNIFICANCE: We present a human bone marrow organoid that supports the growth of primary cells from patients with myeloid and lymphoid blood cancers. This model allows for mechanistic studies of blood cancers in the context of their microenvironment and provides a much-needed ex vivo tool for the prioritization of new therapeutics.</p

    Accelerations for global optimization methods that use second derivative information

    Get PDF
    Two new improvements for the algorithm of Breiman & Cutler are presented. Better envelopes can be built up using positive definite quadratic forms. Better utilization of first and second derivative information is attained by combining both global aspects of curvature and local aspects nearthe global optimum. The basis of the results is the geometric viewpoint developed by the first author and can be applied to a number of covering type methods. Improvements in convergence rates are demonstrated empirically on standard test functions

    Random Forests - a Statistical Tool for the Sciences

    No full text
    Non UBCUnreviewedAuthor affiliation: Utah State UniversityFacult

    DNA Macroarray Profiling of Lactococcus lactis subsp. lactis IL1403 Gene Expression during Environmental Stresses

    No full text
    This report describes the use of an oligonucleotide macroarray to profile the expression of 375 genes in Lactococcus lactis subsp. lactis IL1403 during heat, acid, and osmotic stress. A set of known stress-associated genes in IL1403 was used as the internal control on the array. Every stress response was accurately detected using the macroarray, compared to data from previous reports. As a group, the expression patterns of the investigated metabolic genes were significantly altered by heat, acid, and osmotic stresses. Specifically, 13 to 18% of the investigated genes were differentially expressed in each of the environmental stress treatments. Interestingly, the methionine biosynthesis pathway genes (metA-metB1 and metB2-cysK) were induced during heat shock, but methionine utilization genes, such as metK, were induced during acid stress. These data provide a possible explanation for the differences between acid tolerance mechanisms of L. lactis strains IL1403 and MG1363 reported previously. Several groups of transcriptional responses were common among the stress treatments, such as repression of peptide transporter genes, including the opt operon (also known as dpp) and dtpT. Reduction of peptide transport due to environmental stress will have important implications in the cheese ripening process. Although stress responses in lactococci were extensively studied during the last decade, additional information about this bacterium was gained from the use of this metabolic array

    Undetected Genotyping Errors Cause Apparent Overtransmission of Common Alleles in the Transmission/Disequilibrium Test

    Get PDF
    The transmission/disequilibrium test (TDT), a family-based test of linkage and association, is a popular and intuitive statistical test for studies of complex inheritance, as it is nonparametric and robust to population stratification. We carried out a literature search and located 79 significant TDT-derived associations between a microsatellite marker allele and a disease. Among these, there were 31 (39%) in which the most common allele was found to exhibit distorted transmission to affected offspring, implying that the allele may be associated with either susceptibility to or protection from a disease. In 27 of these 31 studies (87%), the most common allele appeared to be overtransmitted to affected offspring (a risk factor), and, in the remaining 4 studies, the most common allele appeared to be undertransmitted (a protective factor). In a second literature search, we identified 92 case-control studies in which a microsatellite marker allele was found to have significantly different frequencies in case and control groups. Of these, there were 37 instances (40%) in which the most common allele was involved. In 12 of these 37 studies (32%), the most common allele was enriched in cases relative to controls (a risk factor), and, in the remaining 25 studies, the most common allele was enriched in controls (a protective factor). Thus, the most common allele appears to be a risk factor when identified through the TDT, and it appears to be protective when identified through case-control analysis. To understand this phenomenon, we incorporated an error model into the calculation of the TDT statistic. We show that undetected genotyping error can cause apparent transmission distortion at markers with alleles of unequal frequency. We demonstrate that this distortion is in the direction of overtransmission for common alleles. Therefore, we conclude that undetected genotyping errors may be contributing to an inflated false-positive rate among reported TDT-derived associations and that genotyping fidelity must be increased

    Undetected Genotyping Errors Cause Apparent Overtransmission of Common Alleles in the Transmission/Disequilibrium Test

    Get PDF
    The transmission/disequilibrium test (TDT), a family-based test of linkage and association, is a popular and intuitive statistical test for studies of complex inheritance, as it is nonparametric and robust to population stratification. We carried out a literature search and located 79 significant TDT-derived associations between a microsatellite marker allele and a disease. Among these, there were 31 (39%) in which the most common allele was found to exhibit distorted transmission to affected offspring, implying that the allele may be associated with either susceptibility to or protection from a disease. In 27 of these 31 studies (87%), the most common allele appeared to be overtransmitted to affected offspring (a risk factor), and, in the remaining 4 studies, the most common allele appeared to be undertransmitted (a protective factor). In a second literature search, we identified 92 case-control studies in which a microsatellite marker allele was found to have significantly different frequencies in case and control groups. Of these, there were 37 instances (40%) in which the most common allele was involved. In 12 of these 37 studies (32%), the most common allele was enriched in cases relative to controls (a risk factor), and, in the remaining 25 studies, the most common allele was enriched in controls (a protective factor). Thus, the most common allele appears to be a risk factor when identified through the TDT, and it appears to be protective when identified through case-control analysis. To understand this phenomenon, we incorporated an error model into the calculation of the TDT statistic. We show that undetected genotyping error can cause apparent transmission distortion at markers with alleles of unequal frequency. We demonstrate that this distortion is in the direction of overtransmission for common alleles. Therefore, we conclude that undetected genotyping errors may be contributing to an inflated false-positive rate among reported TDT-derived associations and that genotyping fidelity must be increased
    corecore