26,597 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Metabolic characteristics and genomic epidemiology of Escherichia coli serogroup O145 : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Microbiology at Massey University, Palmerston North, New Zealand

    Get PDF
    Shiga toxin-producing Escherichia coli (STEC) are a global public health concern, and can cause severe human disease. Ruminants are asymptomatic reservoirs of STEC, shedding this pathogen via their faeces. There is ‘zero tolerance’ for the Top 7 STEC serogroups (O26, O45, O103, O111, O121, O145 and O157) in ground beef products exported to the USA. STEC may contaminate carcasses during processing and therefore are a major regulatory concern for New Zealand’s meat industry. A previous study investigating the prevalence of STEC in young calves (n=1508) throughout New Zealand identified STEC O145 as the most prevalent serogroup (43%) at the dairy farm level compared to the other Top 7 serogroups. This high prevalence underlines STEC O145 as a public health concern and an issue for the meat industry. Current culture-based methods for STEC detection are not fully discriminatory due to the lack of consistent differential characteristics between STEC and non-pathogenic E. coli. This study aims to (i) investigate metabolic characteristics of E. coli O145 to facilitate the differential culture of this serogroup and (ii) understand the genomic epidemiology of E. coli O145 using whole genome sequencing (WGS). E. coli O145 strains examined in this study were genetically and metabolically diverse, according to carbon utilisation. The metabolic and genomic analyses were unable to differentiate between stx-positive and stx-negative O145 strains and there was no association with isolation source. However, clustering of O145 strains was observed according to multi-locus sequence type and at the level of eae subtype, a gene encoding the protein intimin which is involved in bacterial attachment to intestinal epithelial cells. Carbon substrates such as D-serine and D-malic acid were identified as candidate metabolites to differentiate defined O145 sequence types and may assist with identification in conjunction with currently available molecular methods. This research has demonstrated the genetic heterogeneity of serogroup O145 and has made significant progress in the identification of metabolites that may prove beneficial in the development of a differential media for certain subsets of serogroup O145. Such a medium would prove a valuable tool for maintaining and monitoring public health and providing food quality and safety assurances that New Zealand meat for export is free of this pathogen

    Detection of regulator genes and eQTLs in gene networks

    Full text link
    Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure
    corecore