33 research outputs found

    Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes

    Get PDF
    10.1186/1471-2105-14-S16-S6BMC Bioinformatics14SUPPL16-BBMI

    Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci.

    Get PDF
    Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS - the ability to detect genetic association by linkage disequilibrium (LD) - is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact

    Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

    Full text link
    Abstract Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.http://deepblue.lib.umich.edu/bitstream/2027.42/134522/1/13742_2016_Article_117.pd

    metaCCA : summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

    Get PDF
    Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies.Peer reviewe

    Insight from a Containerized Kubernetes Workload Introspection

    Get PDF
    Developments in virtual containers, especially in the cloud infrastructure, have led to diversification of jobs that containers are being used to support, particularly in the big data and machine learning spaces. The diversification has been powered by the adoption of orchestration systems that marshal fleets of containers to accomplish complex programming tasks. The additional components in the vertical technology stack, plus the continued horizontal scaling have led to questions regarding how to forensically analyze complicated technology stacks. This paper proposed a solution through the use of introspection. An exploratory case study has been conducted on a bare-metal cloud that utilizes Kubernetes, the introspection tool Prometheus, and Apache Spark. The contribution of this research is two-fold. First, it provides empirical support that introspection tools can acquire forensically viable data from different levels of a technology stack. Second, it provides the ground work for comparisons between different virtual container platforms

    Machine learning approaches to genome-wide association studies

    Get PDF
    Genome-wide Association Studies (GWAS) are conducted to identify single nucleotide polymorphisms (variants) associated with a phenotype within a specific population. These variants associated with diseases have a complex molecular aetiology with which they cause the disease phenotype. The genotyping data generated from subjects of study is of high dimensionality, which is a challenge. The problem is that the dataset has a large number of features and a relatively smaller sample size. However, statistical testing is the standard approach being applied to identify these variants that influence the phenotype of interest. The wide applications and abilities of Machine Learning (ML) algorithms promise to understand the effects of these variants better. The aim of this work is to discuss the applications and future trends of ML algorithms in GWAS towards understanding the effects of population genetic variant. It was discovered that algorithms such as classification, regression, ensemble, and neural networks have been applied to GWAS for which this work has further discussed comprehensively including their application areas. The ML algorithms have been applied to the identification of significant single nucleotide polymorphisms (SNP), disease risk assessment & prediction, detection of epistatic non-linear interaction, and integrated with other omics sets. This comprehensive review has highlighted these areas of application and sheds light on the promise of innovating machine learning algorithms into the computational and statistical pipeline of genome-wide association studies. This will be beneficial for better understanding of how variants are affected by disease biology and how the same variants can influence risk by developing a particular phenotype for favourable natural selection

    Cardiovascular Imaging and Intervention Through the Lens of Artificial Intelligence

    Get PDF
    Artificial Intelligence (AI) is the simulation of human intelligence in machines so they can perform various actions and execute decision-making. Machine learning (ML), a branch of AI, can analyse information from data and discover novel patterns. AI and ML are rapidly gaining prominence in healthcare as data become increasingly complex. These algorithms can enhance the role of cardiovascular imaging by automating many tasks or calculations, find new patterns or phenotypes in data and provide alternative diagnoses. In interventional cardiology, AI can assist in intraprocedural guidance, intravascular imaging and provide additional information to the operator. AI is slowly expanding its boundaries into interventional cardiology and can fundamentally alter the field. In this review, the authors discuss how AI can enhance the role of cardiovascular imaging and imaging in interventional cardiology
    corecore