156 research outputs found

    GARNET – gene set analysis with exploration of annotation relations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information.</p> <p>Results</p> <p>GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules - <it>gene set manager</it>, <it>gene set analysis</it> and <it>gene set retrieval</it>, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations.</p> <p>Conclusions</p> <p>GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (<url>http://garnet.isysbio.org/</url> or <url>http://ercsb.ewha.ac.kr/garnet/</url>).</p

    Current State-of-the-Art Bioinformatics Methods in Alzheimer's Disease Studies

    Get PDF
    Alzheimeri tõbi on kõige levinum dementsuse vorm ning see esineb ülemaailmselt vanematel inimestel. Uuringud keskenduvad põhjuste ja ravi leidmisele. Käsitletavad meetodid põhinevad geeniekspressiooni andmetel. Erinevalt avalduvad geenid eraldatakse ning kasutatakse edasistes analüüsides.Käesolev bakalaureusetöö pakub ülevaadet Alzheimeri tõve uuringutes kasutatavatest bioinformaatilistest meetoditest. Tuleneval mitmekülgsete meetodite hulgal põhinev analüüs kirjeldab lähenemisi lühidalt ning toob välja näiteid valitud artiklite hulgast.This thesis provides an overview of the state-of-the-art methods currently used in studying Alzheimer's disease.\\The first section contains background information relevant to the better understanding of the subsequent analysis section. The section is divided into two, providing descriptions of main biological and bioinformatical ideas and methods.\\The second section contains the analysis of a selected subset of articles and provides a case study of a single chosen article. The analysis is split into parts relative to the studies conducted and compares the methods described.\\The resulting overview of the articles can be used a short introduction of the current state in research focused on the better understanding of the neurodegenerative disease

    Functional coherence and annotation agreement metrics for enzyme families

    Get PDF
    Tese de doutoramento, Informática (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2015A range of methodologies is used to create sequence annotations, from manual curation by specialized curators to several automatic procedures. The multitude of existing annotation methods consequently generates an annotation heterogeneity in terms of coverage and specificity across the biological sequence space. When comparing groups of similar sequences (such as protein families) this heterogeneity can introduce issues regarding the interpretation of the actual functional similarity and the overall functional coherence. A direct path to mitigate these issues is the annotation extension within the protein families under analysis. This thesis postulates that the protein families can be used as knowledgebases for their own annotation extension with the assistance of a proper functional coherence analysis. Therefore, a modular framework for functional coherence analysis and annotation extension in protein families was proposed. The framework includes a proposed module for functional coherence analysis that relies on graph visualization, term enrichment and other statistics. In this work it was implemented and made available as a publicly accessible web application, GRYFUN which can be accessed at http://xldb.di.fc.ul.pt/gryfun/. In addition, four metrics were developed to assess distinct aspects of the coherence and completeness in protein families in conjunction with additional existing metrics. Therefore the use of the complete proposed framework by curators can be regarded as a semi-automatic approach to annotation able to assist with protein annotation extension.Diversas metodologias são usadas para criar anotações em sequências, desde a curação manual por curadores especializados até vários procedimentos automáticos. A multitude de métodos de anotação existentes consequentemente gera heterogeneidade nas anotações em termos de cobertura e especificidade em espaços de sequências biológicas. Ao comparar grupos de sequências semelhantes (tais como famílias proteícas) esta heterogeneidade pode introduzir dificuldades quanto à interpretação da semelhança e coerência funcional nesses grupos. Uma maneira de mitigar essas dificuldades é a extensão da anotação dentro das famílias proteícas em análise. Esta tese postula que famílias proteícas podem ser usadas como bases de conhecimento para a sua própria extensão de anotação através do uso de análises de coerência funcional apropriadas. Portanto, uma framework modular para a análise de coerência funcional e extensão de anotação em famílias proteícas foi proposta. A framework incluí um módulo proposto para a análise de coerência funcional baseado em visualização de grafos, enriquecimento de termos e outras estatísticas. Neste trabalho o módulo foi implementado e disponibilizado como uma aplicação web, GRYFUN que pode ser acedida em http://xldb.di.fc.ul.pt/gryfun/. Adicionalmente, quatro métricas foram desenvolvidas para aferir aspectos distinctos da coerência e completude de anotação em famílias proteícas em conjunção com métricas já existentes. Portanto, o uso da framework completa por curadores, como uma estratégia de anotação semi-automática, é capaz de potenciar a extensão de anotação.Fundação para a Ciência e a Tecnologia (FCT), SFRH/BD/48035/200

    Heterologous oligonucleotide microarrays for transcriptomics in a non-model species; a proof-of-concept study of drought stress in Musa

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>'Systems-wide' approaches such as microarray RNA-profiling are ideally suited to the study of the complex overlapping responses of plants to biotic and abiotic stresses. However, commercial microarrays are only available for a limited number of plant species and development costs are so substantial as to be prohibitive for most research groups. Here we evaluate the use of cross-hybridisation to Affymetrix oligonucleotide GeneChip<sup>® </sup>microarrays to profile the response of the banana (<it>Musa </it>spp.) leaf transcriptome to drought stress using a genomic DNA (gDNA)-based probe-selection strategy to improve the efficiency of detection of differentially expressed <it>Musa </it>transcripts.</p> <p>Results</p> <p>Following cross-hybridisation of <it>Musa </it>gDNA to the Rice GeneChip<sup>® </sup>Genome Array, ~33,700 gene-specific probe-sets had a sufficiently high degree of homology to be retained for transcriptomic analyses. In a proof-of-concept approach, pooled RNA representing a single biological replicate of control and drought stressed leaves of the <it>Musa </it>cultivar 'Cachaco' were hybridised to the Affymetrix Rice Genome Array. A total of 2,910 <it>Musa </it>gene homologues with a >2-fold difference in expression levels were subsequently identified. These drought-responsive transcripts included many functional classes associated with plant biotic and abiotic stress responses, as well as a range of regulatory genes known to be involved in coordinating abiotic stress responses. This latter group included members of the ERF, DREB, MYB, bZIP and bHLH transcription factor families. Fifty-two of these drought-sensitive <it>Musa </it>transcripts were homologous to genes underlying QTLs for drought and cold tolerance in rice, including in 2 instances QTLs associated with a single underlying gene. The list of drought-responsive transcripts also included genes identified in publicly-available comparative transcriptomics experiments.</p> <p>Conclusion</p> <p>Our results demonstrate that despite the general paucity of nucleotide sequence data in <it>Musa </it>and only distant phylogenetic relations to rice, gDNA probe-based cross-hybridisation to the Rice GeneChip<sup>® </sup>is a highly promising strategy to study complex biological responses and illustrates the potential of such strategies for gene discovery in non-model species.</p

    INTEGRATIVE SYSTEM BIOLOGY STUDIES ON HIGH THROUGHPUT GENOMICS AND PROTEOMICS DATASET

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)The post genomic era has propelled us to the view that the biological systems are complex network of interacting genes, proteins and small molecules that give rise to biological form and function. The past decade has seen the advent of number of new technologies designed to study the biological systems on a genome wide scale. These new technologies offers an insight in to the activity of thousands of genes and proteins in cell thereby changed the conventional reductionist view of the systems. However the deluge of data surpasses the analytical and critical abilities of the researches and thereby demands the development of new computational methods. The challenge no longer lies in the acquisition of expression profiles, but rather in the interpretation for the results to gain insights into biological mechanisms. In three different case studies, we applied various system biology techniques on publicly available and in-house genomics and proteomics data set to identify sub-network signatures. In First study, we integrated prior knowledge from gene signatures, GSEA and gene/protein network modeling to identify pathways involved in colorectal cancer, while in second, we identified plasma based network signatures for Alzheimer's disease by combining various feature selection and classification approach. In final study, we did an integrated miRNA-mRNA analysis to identify the role of Myeloid Derived Stem Cells (MDSCs) in T-Cell suppression

    2020 Student Symposium Research and Creative Activity Book of Abstracts

    Get PDF
    The UMaine Student Symposium (UMSS) is an annual event that celebrates undergraduate and graduate student research and creative work. Students from a variety of disciplines present their achievements with video presentations. It’s the ideal occasion for the community to see how UMaine students’ work impacts locally – and beyond. The 2020 Student Symposium Research and Creative Activity Book of Abstracts includes a complete list of student presenters as well as abstracts related to their works

    Systems approaches to drug repositioning

    Get PDF
    PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased biomedical knowledge and evolving approaches to Research and Development (R&D). One complementary approach to drug discovery is that of drug repositioning which focusses on identifying novel uses for existing drugs. By focussing on existing drugs that have already reached the market, drug repositioning has the potential to both reduce the timeframe and cost of getting a disease treatment to those that need it. Many marketed examples of repositioned drugs have been found via serendipitous or rational observations, highlighting the need for more systematic methodologies. Systems approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but require an integrative approach to biological data. Integrated networks can facilitate systems-level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person can identify portions of the graph that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated procedures are required to mine integrated networks systematically for these subgraphs and bring them to the attention of the user. The aim of this project was the development of novel computational methods to identify new therapeutic uses for existing drugs (with particular focus on active small molecules) using data integration. A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning Network Integration Framework (DReNInF) was developed as part of this work. This framework includes a high-level ontology, Drug Repositioning Network Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite of parsers; and a generic semantic graph integration platform. This framework enables the production of integrated networks maintaining strict semantics that are important in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description Framework (RDF) dataset. A Web-based front end was developed, which includes a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this dataset. To automate the mining of drug repositioning datasets, a formal framework for the definition of semantic subgraphs was established and a method for Drug Repositioning Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated. 9,643,061 putative D-T interactions were identified and ranked, with a strong correlation between highly scored associations and those supported by literature observed. The 20 top ranked associations were analysed in more detail with 14 found to be novel and six found to be supported by the literature. It was also shown that this approach better prioritises known D-T interactions, than other state-of-the-art methodologies. The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also investigated. As target-based approaches are utilised heavily in the field of drug discovery, it is necessary to have a systematic method to rank Gene-Disease (G-D) associations. Although methods already exist to collect, integrate and score these associations, these scores are often not a reliable re flection of expert knowledge. Therefore, an integrated data-driven approach to drug repositioning was developed using a Bayesian statistics approach and applied to rank 309,885 G-D associations using existing knowledge. Ranked associations were then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network it was shown that diseases of the central nervous system (CNS) provide an area of interest. The network was then systematically mined for semantic subgraphs that capture novel Dr-D relations. 275,934 Dr-D associations were identified and ranked, with those more likely to be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within the field of drug repositioning. DReNIn, for example, includes data that previous comparable datasets relevant to drug repositioning have neglected, such as clinical trial data and drug indications. Furthermore, the dataset may be easily extended using DReNInF to include future data as and when it becomes available, such as G-D association directionality (i.e. is the mutation a loss-of-function or gain-of-function). Unlike other algorithms and approaches developed for drug repositioning, DReSMin can be used to infer any types of associations captured in the target semantic network. Moreover, the approaches presented here should be more generically applicable to other fields that require algorithms for the integration and mining of semantically rich networks.European and Physical Sciences Research Council (EPSRC) and GS
    corecore