7 research outputs found

    Novel Random Forest Methods and Algorithms for Autism Spectrum Disorders Research

    Get PDF
    Random Forest (RF) is a flexible, easy to use machine learning algorithm that was proposed by Leo Breiman in 2001 for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Its superior prediction accuracy has made it the most used algorithms in the machine learning field. In this dissertation, we use the random forest as the main building block for creating a proximity matrix for multivariate matching and diagnostic classification problems that are used for autism research (as an exemplary application). In observational studies, matching is used to optimize the balance between treatment groups. Although many matching algorithms can achieve this goal, in some fields, matching could face its own challenges. Datasets with small sample sizes and limited control reservoirs are prone to this issue. This problem may apply to many ongoing research fields, such as autism spectrum disorder (ASD). We are interested in eliminating the effect of undesirable variables using two types of algorithms, 1:k nearest matching, and full matching. Therefore, we first introduced three different types of 1:k nearest matching algorithms and two full matching based methods to compare group-wise matching vs. pairwise matching for creating an optimal balance and sample size. These proposed methods were applied to a data set from the Brain Development Imaging Lab (BDIL) at San Diego State University. Next, we introduce the iterMatch R package. This package finds a 1:1 matched subsample of the data that is balanced on all matching variables while incorporating missing values in an iterative manner. Missing variables in dataset need to be imputed or only complete cases can be considered in matching. Losing data because of the limitations in a matching algorithm can decrease the power of the study as well as omit important information. Other than introducing the iterMatch package, tuning the input parameters of this package is discussed, using medium and large datasets from the Autism Brain Imaging Data Exchange (ABIDE). We then propose two mixed-effects random forest-based classification algorithms applicable to multi-site (clustered data) using resting-state fMRI (rs-fMRI) and structural MRI (sMRI). These algorithms control the random effects of the confounding factor of the site and fixed-effect of phenotype variable of age internally while building the prediction model. On top of controlling the effects of confounding variables, these algorithms take away the necessity of utilizing a separate dimension reduction algorithm for high dimensional data such as functional connectivity in a non-linear fashion. We show the proposed algorithms can achieve prediction accuracy over 80 percent using test data

    Spatial Context of Tumor Immune Microenvironment of Matched Primary and Recurrent Glioblastomas

    Get PDF
    https://openworks.mdanderson.org/sumexp22/1078/thumbnail.jp

    SUPPORT VECTORS MACHINE: A TUTORIAL WITH R

    No full text

    Gibbs process distinguishes survival and reveals contact-inhibition genes in Glioblastoma multiforme.

    No full text
    Tumor growth is a spatiotemporal birth-and-death process with loss of heterotypic contact-inhibition of locomotion (CIL) of tumor cells promoting invasion and metastasis. Therefore, representing tumor cells as two-dimensional points, we can expect the tumor tissues in histology slides to reflect realizations of spatial birth-and-death process which can be mathematically modeled to reveal molecular mechanisms of CIL, provided the mathematics models the inhibitory interactions. Gibbs process as an inhibitory point process is a natural choice since it is an equilibrium process of the spatial birth-and-death process. That is if the tumor cells maintain homotypic contact inhibition, the spatial distributions of tumor cells will result in Gibbs hard core process over long time scales. In order to verify if this is the case, we applied the Gibbs process to 411 TCGA Glioblastoma multiforme patient images. Our imaging dataset included all cases for which diagnostic slide images were available. The model revealed two groups of patients, one of which - the "Gibbs group," showed the convergence of the Gibbs process with significant survival difference. Further smoothing the discretized (and noisy) inhibition metric, for both increasing and randomized survival time, we found a significant association of the patients in the Gibbs group with increasing survival time. The mean inhibition metric also revealed the point at which the homotypic CIL establishes in tumor cells. Besides, RNAseq analysis between patients with loss of heterotypic CIL and intact homotypic CIL in the Gibbs group unveiled cell movement gene signatures and differences in Actin cytoskeleton and RhoA signaling pathways as key molecular alterations. These genes and pathways have established roles in CIL. Taken together, our integrated analysis of patient images and RNAseq data provides for the first time a mathematical basis for CIL in tumors, explains survival as well as uncovers the underlying molecular landscape for this key tumor invasion and metastatic phenomenon

    Diagnostic classification of intrinsic functional connectivity highlights somatosensory, default mode, and visual regions in autism

    Get PDF
    Despite consensus on the neurological nature of autism spectrum disorders (ASD), brain biomarkers remain unknown and diagnosis continues to be based on behavioral criteria. Growing evidence suggests that brain abnormalities in ASD occur at the level of interconnected networks; however, previous attempts using functional connectivity data for diagnostic classification have reached only moderate accuracy. We selected 252 low-motion resting-state functional MRI (rs-fMRI) scans from the Autism Brain Imaging Data Exchange (ABIDE) including typically developing (TD) and ASD participants (n = 126 each), matched for age, non-verbal IQ, and head motion. A matrix of functional connectivities between 220 functionally defined regions of interest was used for diagnostic classification, implementing several machine learning tools. While support vector machines in combination with particle swarm optimization and recursive feature elimination performed modestly (with accuracies for validation datasets <70%), diagnostic classification reached a high accuracy of 91% with random forest (RF), a nonparametric ensemble learning method. Among the 100 most informative features (connectivities), for which this peak accuracy was achieved, participation of somatosensory, default mode, visual, and subcortical regions stood out. Whereas some of these findings were expected, given previous findings of default mode abnormalities and atypical visual functioning in ASD, the prominent role of somatosensory regions was remarkable. The finding of peak accuracy for 100 interregional functional connectivities further suggests that brain biomarkers of ASD may be regionally complex and distributed, rather than localized
    corecore