5 research outputs found

    GeneBrowser 2: an application to explore and identify common biological traits in a set of genes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The development of high-throughput laboratory techniques created a demand for computer-assisted result analysis tools. Many of these techniques return lists of genes whose interpretation requires finding relevant biological roles for the problem at hand. The required information is typically available in public databases, and usually, this information must be manually retrieved to complement the analysis. This process is a very time-consuming task that should be automated as much as possible.</p> <p>Results</p> <p>GeneBrowser is a web-based tool that, for a given list of genes, combines data from several public databases with visualisation and analysis methods to help identify the most relevant and common biological characteristics. The functionalities provided include the following: a central point with the most relevant biological information for each inserted gene; a list of the most related papers in PubMed and gene expression studies in ArrayExpress; and an extended approach to functional analysis applied to Gene Ontology, homologies, gene chromosomal localisation and pathways.</p> <p>Conclusions</p> <p>GeneBrowser provides a unique entry point to several visualisation and analysis methods, providing fast and easy analysis of a set of genes. GeneBrowser fills the gap between Web portals that analyse one gene at a time and functional analysis tools that are limited in scope and usually desktop-based.</p

    geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification

    Get PDF
    BACKGROUND: The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. RESULTS: geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. CONCLUSIONS: geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/

    Composição de serviços para aplicações biomédicas

    Get PDF
    Doutoramento em Engenharia InformáticaA exigente inovação na área das aplicações biomédicas tem guiado a evolução das tecnologias de informação nas últimas décadas. Os desafios associados a uma gestão, integração, análise e interpretação eficientes dos dados provenientes das mais modernas tecnologias de hardware e software requerem um esforço concertado. Desde hardware para sequenciação de genes a registos electrónicos de paciente, passando por pesquisa de fármacos, a possibilidade de explorar com precisão os dados destes ambientes é vital para a compreensão da saúde humana. Esta tese engloba a discussão e o desenvolvimento de melhores estratégias informáticas para ultrapassar estes desafios, principalmente no contexto da composição de serviços, incluindo técnicas flexíveis de integração de dados, como warehousing ou federação, e técnicas avançadas de interoperabilidade, como serviços web ou LinkedData. A composição de serviços é apresentada como um ideal genérico, direcionado para a integração de dados e para a interoperabilidade de software. Relativamente a esta última, esta investigação debruçou-se sobre o campo da farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições para este projeto, um novo standard de interoperabilidade e um motor de execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma plataforma para realizar estudos avançados de farmacovigilância. No contexto do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os desafios associados à integração de dados distribuídos e heterogéneos no campo do varíoma humano. Foi criada uma nova solução, WAVe - Web Analyses of the Variome, que fornece uma coleção rica de dados de variação genética através de uma interface Web inovadora e de uma API avançada. O desenvolvimento destas estratégias evidenciou duas oportunidades claras na área de software biomédico: melhorar o processo de implementação de software através do recurso a técnicas de desenvolvimento rápidas e aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do paradigma de web semântica. A plataforma COEUS atravessa as fronteiras de integração e interoperabilidade, fornecendo metodologias para a aquisição e tradução flexíveis de dados, bem como uma camada de serviços interoperáveis para explorar semanticamente os dados agregados. Combinando as técnicas de desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a box", a plataforma COEUS é uma aproximação pioneira, permitindo o desenvolvimento da próxima geração de aplicações biomédicas.The demand for innovation in the biomedical software domain has been an information technologies evolution driver over the last decades. The challenges associated with the effective management, integration, analyses and interpretation of the wealth of life sciences information stemming from modern hardware and software technologies require concerted efforts. From gene sequencing hardware to pharmacology research up to patient electronic health records, the ability to accurately explore data from these environments is vital to further improve our understanding of human health. This thesis encloses the discussion on building better informatics strategies to address these challenges, primarily in the context of service composition, including warehousing and federation strategies for resource integration, as well as web services or LinkedData for software interoperability. Service composition is introduced as a general principle, geared towards data integration and software interoperability. Concerning the latter, this research covers the service composition requirements within the pharmacovigilance field, namely on the European EU-ADR project. The contributions to this area, the definition of a new interoperability standard and the creation of a new workflow-wrapping engine, are behind the successful construction of the EUADR Web Platform, a workspace for delivering advanced pharmacovigilance studies. In the context of the European GEN2PHEN project, this research tackles the challenges associated with the integration of heterogeneous and distributed data in the human variome field. For this matter, a new lightweight solution was created: WAVe, Web Analysis of the Variome, provides a rich collection of genetic variation data through an innovative portal and an advanced API. The development of the strategies underlying these products highlighted clear opportunities in the biomedical software field: enhancing the software implementation process with rapid application development approaches and improving the quality and availability of data with the adoption of the Semantic Web paradigm. COEUS crosses the boundaries of integration and interoperability as it provides a framework for the flexible acquisition and translation of data into a semantic knowledge base, as well as a comprehensive set of interoperability services, from REST to LinkedData, to fully exploit gathered data semantically. By combining the lightness of rapid application development strategies with the richness of its "Semantic Web in a box" approach, COEUS is a pioneering framework to enhance the development of the next generation of biomedical applications

    Using machine learning to predict pathogenicity of genomic variants throughout the human genome

    Get PDF
    Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität. Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores. Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt. Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity. Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants. The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency. In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org

    Do proteoma salivar ao oroloma

    Get PDF
    A cavidade oral humana é um ecossistema complexo onde factores do hospedeiro, microbianos e ambientais interagem num equilíbrio dinâmico que se reflecte no fluido que a envolve: a saliva. A compreensão da biologia da cavidade oral e dos distúrbios que a afectam (ou de doenças sistémicas que nela se reflectem) depende de ferramentas bioinformáticas que façam a compilação, a integração e a aplicação da informação gerada por técnicas de alto rendimento, como a proteómica, que se dedica à identificação de todas as proteínas expressas. Na última década foram determinados diversos proteomas da cavidade oral. No entanto, não existe um instrumento que permita compilar, integrar e interpretar os dados gerados. Neste sentido, este trabalho contribuiu para o desenvolvimento de uma ferramenta bioinformática que permite aos investigadores estudar a diversidade e variabilidade das proteínas que integram o proteoma oral, permitindo definir e caracterizar o oraloma (fisioma da cavidade oral). Este trabalho permitiu compilar os proteomas da cavidade oral publicados na última década e rever a informação relativa às proteínas identificadas, à luz do conhecimento actual. Este processo gerou uma grande quantidade de informação que obrigou à criação de uma base de dados para a armazenar, designada OralOme e de um portal web, designado OralCard, com funcionalidades que permitem ao utilizador pesquisar, integrar, interpretar e visualizar esses dados de forma relevante do ponto de vista biológico e clínico. O estudo das proteínas depositadas no OralOme contribuiu para a compreensão das funções moleculares das proteínas produzidas pelos vários sub-compartimentos da cavidade oral e, deste modo, para o esclarecimento do contributo de cada um deles para as funções da saliva. Neste trabalho foram, ainda, testadas as diversas funcionalidades do OralCard na análise de dados de proteómica da cavidade oral provenientes de amostras de saliva para desenvolver metodologias de análise capazes de esclarecer mecanismos moleculares envolvidos em diversas patologias e para identificar requisitos funcionais a implementar em actualizações futuras. As metodologias seguidas permitiram estudar a relação das proteínas salivares com os estados de saúde oral e sistémica e, consequentemente, salientar o potencial da saliva como fluido de diagnóstico. O desenvolvimento de ferramentas bioinformáticas como o OralCard é um importante contributo para o esclarecimento da biologia oral e para o desenho de estratégias que facilitem a identificação de biomarcadores a partir de amostras de saliva, contribuindo para o desenvolvimento de métodos de diagnóstico e de prognóstico mais eficientes e de terapêuticas mais eficazes.The human oral cavity is a complex ecosystem where host, microbial and environmental factors interact in a dynamic equilibrium reflected in the fluid saliva. The understanding of the biology of the oral cavity and the disturbances affecting it (or the systemic diseases mirrored in it) depends on bioinformatics tools to compile, integrate and apply the information generated by high throughput techniques such as proteomics, which is devoted to the identification of expressed proteins. Several proteomes of the oral cavity have been published in the last decade. A tool to compile, integrate and interpret the data generated by the several studies is still not available. The aim of this work is to contribute to the development of a bioinformatics tool which allows researchers to study the diversity and variability of proteins in the oral proteome defining and characterizing the oralome (physiome of the oral cavity). In this study, the proteomes of the oral cavity published in the last decade were compiled and information regarding the proteins identified was curated. This process generated a large quantity of information leading to the creation of the OralOme database and the OraCard web portal. With OralCard the user may search, integrate, interpret and visualize the data extracting biological and clinical meaning. The study of the proteins in the OralOme contributes to the understanding of the molecular functions of the proteins produced by the various sub compartments of the oral cavity and therefore for the establishment of their contribution to salivary functions. Furthermore, several functionalities of OralCard in the analysis of proteomic data were tested, with the objective of developing analysis methodologies for the clarification of the molecular mechanisms involved in several pathologies and support the inclusion of functional improvements in future updates of the portal. The methodologies used allowed the study of the relationship between salivary proteins and oral and systemic health and therefore support the potential use of saliva as a diagnostic fluid. The development of bioinformatics tools such as OralCard are an important contribution to the understanding of oral biology and the development of strategies that allow the identification of biomarkers from saliva samples, contributing to the development of effective diagnostic, prognostic and therapeutic methods
    corecore