26,570 research outputs found
Populations in statistical genetic modelling and inference
What is a population? This review considers how a population may be defined
in terms of understanding the structure of the underlying genetics of the
individuals involved. The main approach is to consider statistically
identifiable groups of randomly mating individuals, which is well defined in
theory for any type of (sexual) organism. We discuss generative models using
drift, admixture and spatial structure, and the ancestral recombination graph.
These are contrasted with statistical models for inference, principle component
analysis and other `non-parametric' methods. The relationships between these
approaches are explored with both simulated and real-data examples. The
state-of-the-art practical software tools are discussed and contrasted. We
conclude that populations are a useful theoretical construct that can be well
defined in theory and often approximately exist in practice
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Recommended from our members
Structural basis for DNMT3A-mediated de novo DNA methylation.
DNA methylation by de novo DNA methyltransferases 3A (DNMT3A) and 3B (DNMT3B) at cytosines is essential for genome regulation and development. Dysregulation of this process is implicated in various diseases, notably cancer. However, the mechanisms underlying DNMT3 substrate recognition and enzymatic specificity remain elusive. Here we report a 2.65-ångström crystal structure of the DNMT3A-DNMT3L-DNA complex in which two DNMT3A monomers simultaneously attack two cytosine-phosphate-guanine (CpG) dinucleotides, with the target sites separated by 14 base pairs within the same DNA duplex. The DNMT3A-DNA interaction involves a target recognition domain, a catalytic loop, and DNMT3A homodimeric interface. Arg836 of the target recognition domain makes crucial contacts with CpG, ensuring DNMT3A enzymatic preference towards CpG sites in cells. Haematological cancer-associated somatic mutations of the substrate-binding residues decrease DNMT3A activity, induce CpG hypomethylation, and promote transformation of haematopoietic cells. Together, our study reveals the mechanistic basis for DNMT3A-mediated DNA methylation and establishes its aetiological link to human disease
- …