2 research outputs found

    Deep neural network improves the estimation of polygenic risk scores for breast cancer

    Full text link
    Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA and LDpred. In the test cohort with 50% prevalence, the Area Under the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bi-modal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case sub-population with an average PRS significantly higher than the control population and a normal-genetic-risk case sub-population with an average PRS similar to the control population. This allowed DNN to achieve 18.8% recall at 90% precision in the test cohort with 50% prevalence, which can be extrapolated to 65.4% recall at 20% precision in a general population with 12% prevalence. Interpretation of the DNN model identified salient variants that were assigned insignificant p-values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through non-linear relationships.Comment: 28 pages, 7 figures, 2 Table

    Phenotypic and molecular characterization root system architecture in diverse soybean (Glycine max L. Merr.) accessions

    Get PDF
    Root system architecture (RSA), or the spatial arrangement of the root and its morphology, functions to anchor the plant, provide water and nutrient acquisition, nutrient storage and to facilitate plant-microbe interactions such as nodulation in legumes such as soybean [Glycine max L. Merr.)]. Root structure also correlates to environmental advantages, such as nutrient acquisition, drought, flood tolerance, and lodging resistance. After centuries of indirect selection for RSA, there is a focus to harness soybean RSA diversity for exploitation and implementation into cultivar development programs. Researchers have generally taken one of three strategies to approach root phenotyping including controlled laboratory, moderately controlled greenhouse and minimally controlled field methods. In this study we developed a mobile, low-cost, and high-resolution root phenotyping system composed of an imaging platform with computer vision and ML based approaches to establish a seamless end-to-end pipeline. This system provides a high-throughput, cost effective, non-destructive methodology that delivers biologically relevant time-series data on root growth and development for phenomics, genomics, and plant breeding applications. We customized a previous version of the Automated Root Imaging Analysis root phenotyping software. New modifications to the workflow allow integrates time series image capture coupled with automated image processing that uses optical character recognition to identify barcodes, followed by segmentation using a convolutional neural network. The goal of this research was to study the root trait genetic diversity in soybean using 292 soybean accessions from the USDA core collection primarily in maturity group II and III and a subset of the soybean nested association mapping (NAM) parents. Combining 35,448 SNPs with a semi-automated phenotyping platform, these 292 accessions were studied for RSA traits to decipher the genetic diversity and explore informative root (iRoot) categories based on current literature for root shape categories. Genotype- and phenotype-based hierarchical clusters were found from the diverse set with significant correlations. Genotype based clusters correlated with geographical origins, and genetic differentiation indicated that much of US origin genotypes do not possess genetic diversity for RSA traits. Results show that superior root performance and root shape also correlate to specific genomic clusters. This combination of genetic and phenotypic analyses results provides opportunities for targeted breeding efforts to maximize the beneficial genetic diversity for future genetic gains. Further objectives of this study was to identify genetic control of RSA within the diverse soybean landscape as well as determine whether a genomic prediction could be a viable strategy for breeding for root architecture traits. The GWAS detected 30 SNPs which co-located within previously identified QTL for root traits and identified a number of root development gene candidates. The GP model is capable of predicting phenotypes based on genomic data allowing selection of individuals with root traits of interest within the core collection without utilizing phenotypic data. Plant phenomics coupled with molecular technologies and statistical approaches identify genotypes with favorable or unfavorable traits, allowing for inexpensive selections prior to field trial phenotyping. Employment of these genomic and phenomic technologies will allow soybean breeders to vastly expand the scope of a breeding program
    corecore