161 research outputs found
Designing Data-Driven Learning Algorithms: A Necessity to Ensure Effective Post-Genomic Medicine and Biomedical Research
Advances in sequencing technology have significantly contributed to shaping the area of genetics and enabled the identification of genetic variants associated with complex traits through genome-wide association studies. This has provided insights into genetic medicine, in which case, genetic factors influence variability in disease and treatment outcomes. On the other side, the missing or hidden heritability has suggested that the host quality of life and other environmental factors may also influence differences in disease risk and drug/treatment responses in genomic medicine, and orient biomedical research, even though this may be highly constrained by genetic capabilities. It is expected that combining these different factors can yield a paradigm-shift of personalized medicine and lead to a more effective medical treatment. With existing “big data” initiatives and high-performance computing infrastructures, there is a need for data-driven learning algorithms and models that enable the selection and prioritization of relevant genetic variants (post-genomic medicine) and trigger effective translation into clinical practice. In this chapter, we survey and discuss existing machine learning algorithms and post-genomic analysis models supporting the process of identifying valuable markers
Informing disease modelling with brain-relevant functional genomic annotations
The past decade has seen a surge in the number of disease/trait-associated variants, largely because of the union of studies to share genetic data and the availability of electronic health records from large cohorts for research use. Variant discovery for neurological and neuropsychiatric genome-wide association studies, including schizophrenia, Parkinson's disease and Alzheimer's disease, has greatly benefitted; however, the translation of these genetic association results to interpretable biological mechanisms and models is lagging. Interpreting disease-associated variants requires knowledge of gene regulatory mechanisms and computational tools that permit integration of this knowledge with genome-wide association study results. Here, we summarize key conceptual advances in the generation of brain-relevant functional genomic annotations and amongst tools that allow integration of these annotations with association summary statistics, which together provide a new and exciting opportunity to identify disease-relevant genes, pathways and cell types in silico. We discuss the opportunities and challenges associated with these developments and conclude with our perspective on future advances in annotation generation, tool development and the union of the two
Local ancestry inference provides insight into Tilapia breeding programmes
Tilapia is one of the most commercially valuable species in aquaculture with over 5 million tonnes of Nile tilapia, Oreochromis niloticus, produced worldwide every year. It has become increasingly important to keep track of the inheritance of the selected traits under continuous improvement (e.g. growth rate, size at maturity or genetic gender), as selective breeding has also resulted in genes that can hitchhike as part of the process. The goal of this study was to generate a Local Ancestry Interence workflow that harnessed existing tilapia genotyping-by-sequencing studies, such as Double Digest RAD-seq derived Single-Nucleotide Polymorphism markers. We developed a workflow and implemented a suite of tools to resolve the local ancestry of each chromosomal locus based on reference panels of tilapia species of known origin. We used tilapia species, wild populations and breeding programmes to validate our methods. The precision of the pipeline was evaluated on the basis of its ability to identify the genetic makeup of samples of known ancestry. The easy and inexpensive application of local ancestry inference in breeding programmes will facilitate the monitoring of the genetic profile of individuals of interest, the tracking of the movement of genes from parents to offspring and the detection of hybrids and their origin
STATISTICAL METHODS FOR INFERRING GENETIC REGULATION ACROSS HETEROGENEOUS SAMPLES AND MULTIMODAL DATA
As clinical datasets have increased in size and a wider range of molecular profiles can be credibly measured, understanding sources of heterogeneity has become critical in studying complex phenotypes. Here, we investigate and develop statistical approaches to address and analyze technical variation, genetic diversity, and tissue heterogeneity in large biological datasets. Commercially available methods for normalization of NanoString nCounter RNA expression data are suboptimal in fully addressing unwanted technical variation. First, we develop a more comprehensive quality control, normalization, and validation framework for nCounter data, benchmark it against existing normalization methods for nCounter, and show its advantages on four datasets of differing sample sizes. We then develop race-specific and genetic ancestry-adjusted tumor transcriptomic prediction models from germline genetics in the Carolina Breast Cancer Study (CBCS) and study the performance of these models across ancestral groups and molecular subtypes. These models are employed in a transcriptome-wide association study (TWAS) to identify four novel genetic loci associated with breast-cancer specific survival. Next, we extend TWAS to a novel suite of tools, MOSTWAS, to prioritize distal genetic variation in transcriptomic predictive models with two multi-omic approaches that draw from mediation analysis. We empirically show the utility of these extensions in simulation analyses, TCGA breast cancer data, and ROS/MAP brain tissue data. We develop a novel distal-SNPs added-last test, to be used with MOSTWAS models, to prioritize distal loci that give added information, beyond the association in the local locus around a gene. Lastly, we develop DeCompress, a deconvolution method from gene expression from targeted RNA panels such as NanoString, which have a much smaller feature space than traditional RNA expression assays. We propose an ensemble approach that leverages compressed sensing to expand the feature space and validate it on data from the CBCS. We conduct extensive benchmarking of existing deconvolution methods using simulated in-silico experiments, pseudo-targeted panels from published mixing experiments, and data from the CBCS to show the advantage of DeCompress over reference-free methods. We lastly show the utility of in-silico cell-type proportion estimation in outcome prediction and eQTL mapping.Doctor of Philosoph
STATISTICAL METHODS IMPROVING THE CLINICAL UTILITY OF OMICS DATA
Variants identified via genome-wide association studies (GWAS) have ushered in an era of deep interest in omics data. Early adopters have used GWAS discoveries to inform drug targets and establish causal relationships using genetic instruments, yet more research must be done to bring the initial boons of GWAS to clinical practice. My dissertation presents three novel statistical methods which could bridge this gap by correcting biases when analyzing omics data and addressing methodological disparities affecting non-European populations. In my first project, I present THUNDER, a novel deconvolution method tailored to the unique challenges of chromatin conformation capture. Prior to our research, differential analysis of chromatin organization was confounded by underlying cell type proportions. Therefore, analyzing across individuals for differential chromatin activity has been of limited utility. THUNDER accurately estimates cell type proportions, allowing for their inclusion as a confounder in future association studies of Hi-C phenotypes. In my second project, I present GAUDI, a fused lasso approach to estimate polygenic risk scores (PRS) in admixed individuals. Our method addresses the decreases in performance of PRS methods in non-European populations, in part due to previously unaccounted for patterns of genetic admixture. Finally, in my third project, I extend polygenic risk score estimation techniques to the variable copy number setting to identify carriers for Spinal Muscular Atrophy (SMA) for which no standard test to identify these carriers exists.Doctor of Philosoph
Recommended from our members
The GTEx Consortium atlas of genetic regulatory effects across human tissues
The Genotype-Tissue Expression (GTEx) project dissects how genetic variation affects gene expression and splicing. Some human genetic variants affect the amount of RNA produced and the splicing of gene transcripts, crucial steps in development and maintaining a healthy individual. However, some of these changes only occur in a small number of tissues within the body. The Genotype-Tissue Expression (GTEx) project has been expanded over time, and, looking at the final data in version 8, Aguet et al. present a deep characterization of genetic associations and gene expression and splicing in 838 individuals over 49 tissues (see the Perspective by Wilson). This large study was able to characterize the details underlying many aspects of gene expression and provides a resource with which to better understand the fundamental molecular mechanisms of how genetic variants affect gene regulation and complex traits in humans. Science, this issue p. 1318; see also p. 1298 The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.We thank the donors and their families for their generous gifts of organ donation for transplantation and tissue donations for the GTEx research project; the Genomics Platform at the Broad Institute for data generation; J. Struewing for support and leadership of the GTEx project; M. Khan and C. Stolte for the illustrations in Fig. 1; and R. Do, D. Jordan, and M. Verbanck for providing GWAS pleiotropy scores. Funding: This work was supported by the Common Fund of the Office of the Director, U.S. National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, NIA, NIAID, and NINDS through NIH contracts HHSN261200800001E (Leidos Prime contract with NCI: A.M.S., D.E.T., N.V.R., J.A.M., L.S., M.E.B., L.Q., T.K., D.B., K.R., and A.U.), 10XS170 (NDRI: W.F.L., J.A.T., G.K., A.M., S.S., R.H., G.Wa., M.J., M.Wa., L.E.B., C.J., J.W., B.R., M.Hu., K.M., L.A.S., H.M.G., M.Mo., and L.K.B.), 10XS171 (Roswell Park Cancer Institute: B.A.F., M.T.M., E.K., B.M.G., K.D.R., and J.B.), 10X172 (Science Care Inc.), 12ST1039 (IDOX), 10ST1035 (Van Andel Institute: S.D.J., D.C.R., and D.R.V.), HHSN268201000029C (Broad Institute: F.A., G.G., K.G.A., A.V.S., X.Li., E.T., S.G., A.G., S.A., K.H.H., D.T.N., K.H., S.R.M., and J.L.N.), 5U41HG009494 (F.A., G.G., and K.G.A.), and through NIH grants R01 DA006227-17 (University of Miami Brain Bank: D.C.M. and D.A.D.), Supplement to University of Miami grant DA006227 (D.C.M. and D.A.D.), R01 MH090941 (University of Geneva), R01 MH090951 and R01 MH090937 (University of Chicago), R01 MH090936 (University of North Carolina–Chapel Hill), R01MH101814 (M.M.-A., V.W., S.B.M., R.G., E.T.D., D.G.-M., and A.V.), U01HG007593 (S.B.M.), R01MH101822 (C.D.B.), U01HG007598 (M.O. and B.E.S.), U01MH104393 (A.P.F.), extension H002371 to 5U41HG002371 (W.J.K.), as well as other funding sources: R01MH106842 (T.L., P.M., E.F., and P.J.H.), R01HL142028 (T.L., Si.Ka., and P.J.H.), R01GM122924 (T.L. and S.E.C.), R01MH107666 (H.K.I.), P30DK020595 (H.K.I.), UM1HG008901 (T.L.), R01GM124486 (T.L.), R01HG010067 (Y.Pa.), R01HG002585 (G.Wa. and M.St.), Gordon and Betty Moore Foundation GBMF 4559 (G.Wa. and M.St.), 1K99HG009916-01 (S.E.C.), R01HG006855 (Se.Ka. and R.E.H.), BIO2015-70777-P, Ministerio de Economia y Competitividad and FEDER funds (M.M.-A., V.W., R.G., and D.G.-M.), la Caixa Foundation ID 100010434 under agreement LCF/BQ/SO15/52260001 (D.G.-M.), NIH CTSA grant UL1TR002550-01 (P.M.), Marie-Skłodowska Curie fellowship H2020 Grant 706636 (S.K.-H.), R35HG010718 (E.R.G.), FPU15/03635, Ministerio de Educación, Cultura y Deporte (M.M.-A.),R01MH109905, 1R01HG010480 (A.Ba.), Searle Scholar Program (A.Ba.), R01HG008150 (S.B.M.), 5T32HG000044-22, NHGRI Institutional Training Grant in Genome Science (N.R.G.), EU IMI program (UE7-DIRECT-115317-1) (E.T.D. and A.V.), FNS funded project RNA1 (31003A_149984) (E.T.D. and A.V.), DK110919 (F.H.), F32HG009987 (F.H.), Massachusetts Lions Eye Research Fund Grant (A.R.H.), Wellcome grant WT108749/Z/15/Z (P.F.), and European Molecular Biology Laboratory (P.F. and D.Z.).Peer Reviewed"Article signat per 1 autors/es del BSC membres del THE GTEX CONSORTIUM: Marta Mele Messeguer"Postprint (author's final draft
- …