7 research outputs found

    Predicting pathway membership via domain signatures

    Get PDF
    Motivation: Functional characterization of genes is of great importance for the understanding of complex cellular processes. Valuable information for this purpose can be obtained from pathway databases, like KEGG. However, only a small fraction of genes is annotated with pathway information up to now. In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the InterPro database

    Multi-label multi-instance transfer learning for simultaneous reconstruction and cross-talk modeling of multiple human signaling pathways

    Get PDF
    Text file contains the predicted cross-talk signaling components between human signaling pathways (homolog instance). (ZIP 36 KB

    Inferring functional modules of protein families with probabilistic topic models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.</p> <p>Results</p> <p>We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.</p> <p>Conclusions</p> <p>We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.</p

    Development of new knowledge discovery tools to explore biomedical datasets in breast cancer

    Get PDF
    The explorative power of high throughput technologies in cancer research has become well established in recent years, exemplified by diverse gene microarray studies. However, development of the necessary biomedical data analysis tools has historically been confined to a commercial environment, while comprehensive, user-friendly analysis approaches are still needed. Availability of freely-available software, notably the 'R' project statistical programming language, allowed development of a user-friendly multivariate statistics application - Informatics Tenovus (I-10) - in this project. I-10 provides a platform through which powerful existing and future 'R' project statistical analysis methodologies can be applied, without prior programming knowledge. The new system was tested in the context of exploring antihormone resistance in breast cancer, analysing microarray datasets from in vitro models of acquired Tamoxifen (TAMR) or Faslodex resistance (FASR) versus endocrine responsive MCF-7 cells. The analysis not only revealed known de-regulated genes, but also further potential future markers/targets for endocrine response/resistance. The advantages of the 'R' programming environment together with Microsoft Visual Basic.net technology for producing user-friendly biomedical analysis tools facilitated subsequent development of a tool which could explore SEER cancer patient datasets. This new cancer query survival tool - Superstes -allows detailed statistical modelling of the impact that multiple patient attributes (in this instance derived from the SEER breast and colorectal cancer datasets) have on patient survival. The versatility of 'R' was additionally demonstrated in further exploring classifiers, where it was able to interface with the sophisticated, freely available machine learning application 'Weka'. Using 'R' and Weka, breast cancer patient survival was modelled using equivalent patient attributes to the Nottingham Prognostic Index and a 10 year survival subset of the SEER breast cancer dataset. Several machine learning methodologies were compared for their ability to accurately model survival, with their value in routine clinical use for prediction of patient survival then critically evaluated.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Development of new knowledge discovery tools to explore biomedical datasets in breast cancer

    Get PDF
    The explorative power of high throughput technologies in cancer research has become well established in recent years, exemplified by diverse gene microarray studies. However, development of the necessary biomedical data analysis tools has historically been confined to a commercial environment, while comprehensive, user-friendly analysis approaches are still needed. Availability of freely-available software, notably the 'R' project statistical programming language, allowed development of a user-friendly multivariate statistics application - Informatics Tenovus (I-10) - in this project. I-10 provides a platform through which powerful existing and future 'R' project statistical analysis methodologies can be applied, without prior programming knowledge. The new system was tested in the context of exploring antihormone resistance in breast cancer, analysing microarray datasets from in vitro models of acquired Tamoxifen (TAMR) or Faslodex resistance (FASR) versus endocrine responsive MCF-7 cells. The analysis not only revealed known de-regulated genes, but also further potential future markers/targets for endocrine response/resistance. The advantages of the 'R' programming environment together with Microsoft Visual Basic.net technology for producing user-friendly biomedical analysis tools facilitated subsequent development of a tool which could explore SEER cancer patient datasets. This new cancer query survival tool - Superstes -allows detailed statistical modelling of the impact that multiple patient attributes (in this instance derived from the SEER breast and colorectal cancer datasets) have on patient survival. The versatility of 'R' was additionally demonstrated in further exploring classifiers, where it was able to interface with the sophisticated, freely available machine learning application 'Weka'. Using 'R' and Weka, breast cancer patient survival was modelled using equivalent patient attributes to the Nottingham Prognostic Index and a 10 year survival subset of the SEER breast cancer dataset. Several machine learning methodologies were compared for their ability to accurately model survival, with their value in routine clinical use for prediction of patient survival then critically evaluated
    corecore