963 research outputs found

    Development of MOSGUITO: a user-friendly graphical interface for meta-omics data analyses

    Get PDF
    Dissertação de mestrado em BioinformaticsComplex microbial communities are essential to all ecosystems, and by linking microbial identity to function, meta-omics technologies facilitate the interpretation of the processes cat alyzed by microorganisms. MOSCA is a command-line pipeline that performs bioinformatics analyses of metagenomics, metatranscriptomics, and metaproteomics. MOSGUITO is a web based tool developed in React, which allows the configuration of MOSCA’s workflow and the visualization of MOSCA outputs. Although the metadata and the configuration options of MOSCA could be easily customized and downloaded through MOSGUITO, MOSGUITO was unable to interact with MOSCA automatically. In this thesis, a third-tier client-server architecture was developed containing the Client MOSGUITO, the Server MOSCA, and a Database. MOSGUITO as a client-side can retrieve, store and delete data from the Database and start running analysis on MOSCA as a server. MOSCA as a server can receive files from the client-side and start an analysis run. The database can store results from MOSCA, input files from users, and respective user information from their login session. A full guide to how to utilize this new version of MOSGUITO is provided. MOSGUITO client-side can interact with MOSCA as a server using Flask APIs, end users don’t need to have knowledge on command-line pipelines to use MOSCA, nor the computer resources to download it. There fore users using MOSGUITO can optimize the usage and configuration of MOSCA, being able to analyze the data from omics experiments with a simple interaction with MOSGUITO.Comunidades microbianas complexas são essenciais em todos os ecossistemas, as tecnologias metaómicas facilitam a interpretação dos processos catalisados pelos microrganismos, pois permitem ligar a identidade dos microrganismos a sua função. MOSCA é um pipeline que funciona a base de linha de comandos que realiza análises de bioinformática de meta- genómica, metatranscriptómica e metaproteómica. O MOSGUITO é uma ferramenta web desenvolvida em React, que permite a configuração do fluxo de trabalho do MOSCA e a visualização dos resultados. Embora os metadados e as opções de configuração do MOSCA possam ser facilmente personalizados e transferidas através do MOSGUITO, o MOSGUITO não conseguia interagir com o MOSCA automaticamente. Nesta tese, foi desenvolvida uma arquitetura cliente-servidor de terceiro nível contendo o Cliente MOSGUITO, o Servidor MOSCA e uma Base de Dados. O MOSGUITO como cliente pode recuperar, armazenar e excluir dados da base de dados e começar a executar análises no MOSCA como servidor. O MOSCA como servidor pode receber arquivos do lado do cliente e iniciar uma execução de análise. A base de dados pode armazenar resultados do MOSCA, ficheiros de input submetidos pelos utilizadores e respetivas informações da sessão de Login do utilizador. E apresentado um guia completo de como utilizar esta nova versão do MOSGUITO. O lado do cliente MOSGUITO pode interagir com o MOSCA como um servidor usando APIs construídas utilizando a framework Flask. Os Utilizadores finais não precisam ter conhecimento sobre linhas de comando para usar o MOSCA e sem a necessidade de recursos de computador para o transferir. Assim, os utilizadores do MOSGUITO otimizam o uso e a configuração do MOSCA, podendo analisar seus dados com uma simples interação com o MOSGUITO

    AI in Medical Imaging Informatics: Current Challenges and Future Directions

    Get PDF
    This paper reviews state-of-the-art research solutions across the spectrum of medical imaging informatics, discusses clinical translation, and provides future directions for advancing clinical practice. More specifically, it summarizes advances in medical imaging acquisition technologies for different modalities, highlighting the necessity for efficient medical data management strategies in the context of AI in big healthcare data analytics. It then provides a synopsis of contemporary and emerging algorithmic methods for disease classification and organ/ tissue segmentation, focusing on AI and deep learning architectures that have already become the de facto approach. The clinical benefits of in-silico modelling advances linked with evolving 3D reconstruction and visualization applications are further documented. Concluding, integrative analytics approaches driven by associate research branches highlighted in this study promise to revolutionize imaging informatics as known today across the healthcare continuum for both radiology and digital pathology applications. The latter, is projected to enable informed, more accurate diagnosis, timely prognosis, and effective treatment planning, underpinning precision medicine

    Transcriptome Sequencing for Precise and Accurate Measurement of Transcripts and Accessibility of TCGA for Cancer Datasets and Analysis

    Get PDF
    Next-generation sequencing (NGS) technologies are now well established and have become a routine analysis tool for its depth, coverage, and cost. RNA sequencing (RNA-Seq) has readily replaced the conventional array-based approaches and has become method of choice for qualitative and quantitative analysis of transcriptome, quantification of alternative spliced isoforms, identification of sequence variants, novel transcripts, and gene fusions, among many others. The current chapter discusses the multi-step transcriptome data analysis processes in detail, in the context of re-sequencing (where a reference genome is available). We have discussed the processes including quality control, read alignment, quantification of gene from read level, visualization of data at different levels, and the identification of differentially expressed genes and alternatively spliced transcripts. Considering the data that are freely available to the public, we also discuss The Cancer Genome Atlas (TCGA), as a resource of RNA-Seq data on cancer for selection and analysis in specific contexts of experimentation. This chapter provides insights into the applicability, data availability, tools, and statistics for a beginner to get familiar with RNA-Seq data analysis and TCGA

    Integrative bioinformatics applications for complex human disease contexts

    Get PDF
    This thesis presents new methods for the analysis of high-throughput data from modern sources in the context of complex human diseases, at the example of a bioinformatics analysis workflow. New measurement techniques improve the resolution with which cellular and molecular processes can be monitored. While RNA sequencing (RNA-seq) measures mRNA expression, single-cell RNA-seq (scRNA-seq) resolves this on a per-cell basis. Long-read sequencing is increasingly used in genomics. With imaging mass spectrometry (IMS) the protein level in tissues is measured spatially resolved. All these techniques induce specific challenges, which need to be addressed with new computational methods. Collecting knowledge with contextual annotations is important for integrative data analyses. Such knowledge is available through large literature repositories, from which information, such as miRNA-gene interactions, can be extracted using text mining methods. After aggregating this information in new databases, specific questions can be answered with traceable evidence. The combination of experimental data with these databases offers new possibilities for data integrative methods and for answering questions relevant for complex human diseases. Several data sources are made available, such as literature for text mining miRNA-gene interactions (Chapter 2), next- and third-generation sequencing data for genomics and transcriptomics (Chapters 4.1, 5), and IMS for spatially resolved proteomics (Chapter 4.4). For these data sources new methods for information extraction and pre-processing are developed. For instance, third-generation sequencing runs can be monitored and evaluated using the poreSTAT and sequ-into methods. The integrative (down-stream) analyses make use of these (heterogeneous) data sources. The cPred method (Chapter 4.2) for cell type prediction from scRNA-seq data was successfully applied in the context of the SARS-CoV-2 pandemic. The robust differential expression (DE) analysis pipeline RoDE (Chapter 6.1) contains a large set of methods for (differential) data analysis, reporting and visualization of RNA-seq data. Topics of accessibility of bioinformatics software are discussed along practical applications (Chapter 3). The developed miRNA-gene interaction database gives valuable insights into atherosclerosis-relevant processes and serves as regulatory network for the prediction of active miRNA regulators in RoDE (Chapter 6.1). The cPred predictions, RoDE results, scRNA-seq and IMS data are unified as input for the 3D-index Aorta3D (Chapter 6.2), which makes atherosclerosis related datasets browsable. Finally, the scRNA-seq analysis with subsequent cPred cell type prediction, and the robust analysis of bulk-RNA-seq datasets, led to novel insights into COVID-19. Taken all discussed methods together, the integrative analysis methods for complex human disease contexts have been improved at essential positions.Die Dissertation beschreibt Methoden zur Prozessierung von aktuellen Hochdurchsatzdaten, sowie Verfahren zu deren weiterer integrativen Analyse. Diese findet Anwendung vor allem im Kontext von komplexen menschlichen Krankheiten. Neue Messtechniken erlauben eine detailliertere Beobachtung biomedizinischer Prozesse. Mit RNA-Sequenzierung (RNA-seq) wird mRNA-Expression gemessen, mit Hilfe von moderner single-cell-RNA-seq (scRNA-seq) sogar für (sehr viele) einzelne Zellen. Long-Read-Sequenzierung wird zunehmend zur Sequenzierung ganzer Genome eingesetzt. Mittels bildgebender Massenspektrometrie (IMS) können Proteine in Geweben räumlich aufgelöst quantifiziert werden. Diese Techniken bringen spezifische Herausforderungen mit sich, die mit neuen bioinformatischen Methoden angegangen werden müssen. Für die integrative Datenanalyse ist auch die Gewinnung von geeignetem Kontextwissen wichtig. Wissenschaftliche Erkenntnisse werden in Artikeln veröffentlicht, die über große Literaturdatenbanken zugänglich sind. Mittels Textmining können daraus Informationen extrahiert werden, z.B. miRNA-Gen-Interaktionen, die in eigenen Datenbank aggregiert werden um spezifische Fragen mit nachvollziehbaren Belegen zu beantworten. In Kombination mit experimentellen Daten bieten sich so neue Möglichkeiten für integrative Methoden. Durch die Extraktion von Rohdaten und deren Vorprozessierung werden mehrere Datenquellen erschlossen, wie z.B. Literatur für Textmining von miRNA-Gen-Interaktionen (Kapitel 2), Long-Read- und RNA-seq-Daten für Genomics und Transcriptomics (Kapitel 4.2, 5) und IMS für Protein-Messungen (Kapitel 4.4). So dienen z.B. die poreSTAT und sequ-into Methoden der Vorprozessierung und Auswertung von Long-Read-Sequenzierungen. In der integrativen (down-stream) Analyse werden diese (heterogenen) Datenquellen verwendet. Für die Bestimmung von Zelltypen in scRNA-seq-Experimenten wurde die cPred-Methode (Kapitel 4.2) erfolgreich im Kontext der SARS-CoV-2-Pandemie eingesetzt. Auch die robuste Pipeline RoDE fand dort Anwendung, die viele Methoden zur (differentiellen) Datenanalyse, zum Reporting und zur Visualisierung bereitstellt (Kapitel 6.1). Themen der Benutzbarkeit von (bioinformatischer) Software werden an Hand von praktischen Anwendungen diskutiert (Kapitel 3). Die entwickelte miRNA-Gen-Interaktionsdatenbank gibt wertvolle Einblicke in Atherosklerose-relevante Prozesse und dient als regulatorisches Netzwerk für die Vorhersage von aktiven miRNA-Regulatoren in RoDE (Kapitel 6.1). Die cPred-Methode, RoDE-Ergebnisse, scRNA-seq- und IMS-Daten werden im 3D-Index Aorta3D (Kapitel 6.2) zusammengeführt, der relevante Datensätze durchsuchbar macht. Die diskutierten Methoden führen zu erheblichen Verbesserungen für die integrative Datenanalyse in komplexen menschlichen Krankheitskontexten

    Challenges and perspectives in computational deconvolution in genomics data

    Full text link
    Deciphering cell type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach to estimate cell type abundances from a variety of omics data. Despite significant methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four significant challenges from availability of the reference data, generation of simulation data, limitations of computational methodologies, and benchmarking design and implementation. Finally, we make recommendations on reference data generation, new directions of computational methodologies and strategies to promote rigorous benchmarking

    A Hitchhiker's guide through the bio-image analysis software universe

    Get PDF
    Modern research in the life sciences is unthinkable without computational methods for extracting, quantifying and visualising information derived from microscopy imaging data of biological samples. In the past decade, we observed a dramatic increase in available software packages for these purposes. As it is increasingly difficult to keep track of the number of available image analysis platforms, tool collections, components and emerging technologies, we provide a conservative overview of software that we use in daily routine and give insights into emerging new tools. We give guidance on which aspects to consider when choosing the platform that best suits the user's needs, including aspects such as image data type, skills of the team, infrastructure and community at the institute and availability of time and budget.Peer reviewe
    corecore