33 research outputs found
Low diversity Cryptococcus neoformans variety grubii multilocus sequence types from Thailand are consistent with an ancestral African origin.
Published versio
Data from: Phased whole-genome genetic risk in a family quartet using a major allele reference sequence
Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (<1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing
Recommended from our members
Phased whole-genome genetic risk in a family quartet using a major allele reference sequence.
Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing
Recommended from our members
Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data
Abstract: Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the genetic architecture of the trait remains largely unknown. The recent development of machine learning (ML) approaches incited us to apply them to classify healthy and diseased people according to their genomic information. The Immunochip dataset containing 18,227 CD patients and 34,050 healthy controls enrolled and genotyped by the international Inflammatory Bowel Disease genetic consortium (IIBDGC) has been re-analyzed using a set of ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) and artificial neural networks (NN). The main score used to compare the methods was the Area Under the ROC Curve (AUC) statistics. The impact of quality control (QC), imputing and coding methods on LR results showed that QC methods and imputation of missing genotypes may artificially increase the scores. At the opposite, neither the patient/control ratio nor marker preselection or coding strategies significantly affected the results. LR methods, including Lasso, Ridge and ElasticNet provided similar results with a maximum AUC of 0.80. GBT methods like XGBoost, LightGBM and CatBoost, together with dense NN with one or more hidden layers, provided similar AUC values, suggesting limited epistatic effects in the genetic architecture of the trait. ML methods detected near all the genetic variants previously identified by GWAS among the best predictors plus additional predictors with lower effects. The robustness and complementarity of the different methods are also studied. Compared to LR, non-linear models such as GBT or NN may provide robust complementary approaches to identify and classify genetic markers
Biocrust-forming mosses mitigate the negative impacts of increasing aridity on ecosystem multifunctionality in drylands
The increase in aridity predicted with climate change will have a negative impact on the multiple functions and services (multifunctionality) provided by dryland ecosystems worldwide. In these ecosystems, soil communities dominated by mosses, lichens and cyanobacteria (biocrusts) play a key role in supporting multifunctionality. However, whether biocrusts can buffer the negative impacts of aridity on important biogeochemical processes controlling carbon (C), nitrogen (N), and phosphorus (P) pools and fluxes remains largely unknown. Here, we conducted an empirical study, using samples from three continents (North America, Europe and Australia), to evaluate how the increase in aridity predicted by climate change will alter the capacity of biocrust-forming mosses to modulate multiple ecosystem processes related to C, N and P cycles. Compared with soil surfaces lacking biocrusts, biocrust-forming mosses enhanced multiple functions related to C, N and P cycling and storage in semiarid and arid, but not in humid and dry-subhumid, environments. Most importantly, we found that the relative positive effects of biocrust-forming mosses on multifunctionality compared with bare soil increased with increasing aridity. These results were mediated by plant cover and the positive effects exerted by biocrust-forming mosses on the abundance of soil bacteria and fungi. Our findings provide strong evidence that the maintenance of biocrusts is crucial to buffer negative effects of climate change on multifunctionality in global drylands