11 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Discovering Cancer Subtypes via an Accurate Fusion Strategy on Multiple Profile Data

    Get PDF
    Discovering cancer subtypes is useful for guiding clinical treatment of multiple cancers. Progressive profile technologies for tissue have accumulated diverse types of data. Based on these types of expression data, various computational methods have been proposed to predict cancer subtypes. It is crucial to study how to better integrate these multiple profiles of data. In this paper, we collect multiple profiles of data for five cancers on The Cancer Genome Atlas (TCGA). Then, we construct three similarity kernels for all patients of the same cancer by gene expression, miRNA expression and isoform expression data. We also propose a novel unsupervised multiple kernel fusion method, Similarity Kernel Fusion (SKF), in order to integrate three similarity kernels into one combined kernel. Finally, we make use of spectral clustering on the integrated kernel to predict cancer subtypes. In the experimental results, the P-values from the Cox regression model and survival curve analysis can be used to evaluate the performance of predicted subtypes on three datasets. Our kernel fusion method, SKF, has outstanding performance compared with single kernel and other multiple kernel fusion strategies. It demonstrates that our method can accurately identify more accurate subtypes on various kinds of cancers. Our cancer subtype prediction method can identify essential genes and biomarkers for disease diagnosis and prognosis, and we also discuss the possible side effects of therapies and treatment

    Identifying Network Biomarkers for Each Breast Cancer Subtypes Along with Their Effective Single and Paired Repurposed Drugs Using Network-Based Machine Learning Techniques

    Get PDF
    Breast cancer is a complex disease that can be classified into at least 10 different molecular subtypes. Appropriate diagnosis of specific subtypes is critical for ensuring the best possible patient treatment and response to therapy. Current computational methods for determining the subtypes are based on identifying differentially expressed genes (i.e., biomarkers) that can best discriminate the subtypes. Such approaches, however, are known to be unreliable since they yield different biomarker sets when applied to data sets from different studies. Gathering knowledge about the functional relationship among genes will identify “network biomarkers” that will enrich the criteria for biomarker selection. Cancer network biomarkers are subnetworks of functionally related genes that “work in concert” to perform functions associated with a tumorigenic. We propose a machine learning framework that can be used to identify network biomarkers and driver genes for each specific breast cancer subtype. Our results show that the resulting network biomarkers can separate onesubtype from the others with very high accuracy. We also propose an integrated approach that can best capture knowledge (and complex relationships) contained within and between drugs, genes and disease data. A network-based machine learning approach is applied thereafter by using the extracted knowledge and relationships in order to identify single and pair of approved or experimental drugs with potential therapeutic effects on different breast cancer subtypes

    Formal Concept Analysis Applications in Bioinformatics

    Get PDF
    Bioinformatics is an important field that seeks to solve biological problems with the help of computation. One specific field in bioinformatics is that of genomics, the study of genes and their functions. Genomics can provide valuable analysis as to the interaction between how genes interact with their environment. One such way to measure the interaction is through gene expression data, which determines whether (and how much) a certain gene activates in a situation. Analyzing this data can be critical for predicting diseases or other biological reactions. One method used for analysis is Formal Concept Analysis (FCA), a computing technique based in partial orders that allows the user to examine the structural properties of binary data based on which subsets of the data set depend on each other. This thesis surveys, in breadth and depth, the current literature related to the use of FCA for bioinformatics, with particular focus on gene expression data. This includes descriptions of current data management techniques specific to FCA, such as lattice reduction, discretization, and variations of FCA to account for different data types. Advantages and shortcomings of using FCA for genomic investigations, as well as the feasibility of using FCA for this application are addressed. Finally, several areas for future doctoral research are proposed. Adviser: Jitender S. Deogu

    Exploitation of underused Streptomyces through a combined metabolomics-genomics workflow to enhance natural product diversity.

    Get PDF
    The genus Streptomyces is the source of approximately two-thirds of all clinically-used antibiotics. Despite being the source of so many specialised metabolites, genomic analysis indicates that most Streptomyces strains have the potential to produce around twenty-five bioactive metabolites, some of which may be the basis of novel therapies. This makes culture collections of Streptomyces spp. an easily accessible (but under-used) resource to mine for genomic and metabolomic variety. Therefore, the main aim of this project was to initiate exploitation of the culture collection at NCIMB Ltd., by expanding the available chemical space from under-utilised Streptomyces for the production of novel antibiotics. This primarily used a mixture of metabolomic and genomic methods. A high-throughput culture parameter screen was designed around multiple carbon sources, nitrogen sources and extraction sample times. This was tested on the model species S. coelicolor A3(2) to compare differences in the production of known specialised metabolites, using UPLC-MS to analyse crude extracts from growth on agar. Data was analysed using MZmine and putative metabolites were identified using freely-available MS/MS databases - primarily GNPS. This showed clear variation in production of nine identified metabolites - including deferoxamines, germicidins, undecylprodigiosin and coelichelin - as a result of different culture parameters. Therefore, the screen successfully expanded the available chemical space, so was applied to non-model Streptomyces strains. The screen was used to compare the total metabolomic variety produced by three Streptomyces, isolated from different environments, in order to select a strain for further investigation. Comparing metabolomic features using principal component analysis showed the Costa Rican soil isolate S. costaricanus to produce the most variety versus the other two Streptomyces strains. The metabolite family most responsible for principal component separation was identified as the actinomycins. Scale-up of both agar and broth culture was used for metabolite dereplication and bioassays against multidrug resistant Acinetobacter baumannii, which is one of the bacteria on the World Health Organisation's list of pathogens that most urgently require new therapies. Fractions were derived from broth culture supernatant and agar crude extract by flash chromatography, resulting in semi-purified fractions. The predominant metabolite families in fractions were actinomycins and deferoxamines, which were further split by polarity into separate fractions. This resulted in rapid purification of metabolites, with one fraction comprising 80% deferoxamine B by weight. Fractions were tested against A. baumannii using the 2,3-bis (2-methoxy-4-nitro-5-sulfophenyl)-5-[(phenylamino)carbonyl]-2H-tetrazolium hydroxide (XTT) assay, which showed partial inhibition of growth at 50 µg/ml. Examining the bioactive fractions showed potentially novel minor peaks that could be responsible for bioactivity. A high-quality full genome of S. costaricanus was obtained using a combination of MiSeq and MinION sequences. This was analysed with RAST and antiSMASH to determine the specialised metabolite potential of S. costaricanus. AntiSMASH detected thirty-three biosynthetic gene clusters (BGCs), above the mean for Streptomyces. Thus, the confirmed genomic potential also suggested a wider metabolite variety, as indicated by the metabolomic screen. Some of the thirty-three BGC products had been previously detected by UPLC-MS, like actinomycin D and deferoxamine B. Other BGCs had 0% homology to known BGCs, including a terpene BGC which only showed core gene homology to two other Streptomyces. One of these strains shared all of the BGCs with S. costaricanus, including their sequential order and closely approximated genomic locations. Comparison of marker genes with autoMLST gave preliminary evidence for the taxonomic reclassification of S. costaricanus as a strain of S. griseofuscus. Starting from a large collection of unexploited Streptomyces, this project catalogued the metabolomic and genomic diversity of a single strain and its bioactive potential. Together, the project stages formed a workflow for further exploitation of NCIMB Streptomyces and other microbes
    corecore