3,187 research outputs found

    Client applications and Server Side docker for management of RNASeq and/or VariantSeq workflows and pipelines of the GPRO Suite

    Get PDF
    The GPRO suite is an in-progress bioinformatic project for -omic data analyses. As part of the continued growth of this project, we introduce a client side & server side solution for comparative transcriptomics and analysis of variants. The client side consists of two Java applications called "RNASeq" and "VariantSeq" to manage workflows for RNA-seq and Variant-seq analysis, respectively, based on the most common command line interface tools for each topic. Both applications are coupled with a Linux server infrastructure (named GPRO Server Side) that hosts all dependencies of each application (scripts, databases, and command line interface tools). Implementation of the server side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server Side can be deployed via a Docker container that can be installed in the user's PC using any operating system or on remote servers as a cloud solution. The two applications are available as desktop and cloud applications and provide two execution modes: a Step-by-Step mode enables each step of a workflow to be executed independently and a Pipeline mode allows all steps to be run sequentially. The two applications also feature an experimental support system called GENIE that consists of a virtual chatbot/assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline job panel provides information about the status of each task executed in the GPRO Server Side, and the expert provides the user with a potential recommendation to identify or fix failed analyses. The two applications and the GPRO Server Side combine the user-friendliness and security of client software with the efficiency of front-end & back-end solutions to manage command line interface software for RNA-seq and variant-seq analysis via interface environments

    Client applications and server-side docker for management of RNASeq and/or VariantSeq workflows and pipelines of the GPRO suite

    Get PDF
    The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called “RNASeq” and “VariantSeq” to manage pipelines and workflows based on the most common command line interface tools for RNA-seq and Variant-seq analysis, respectively. As such, “RNASeq” and “VariantSeq” are coupled with a Linux server infrastructure (named GPRO Server-Side) that hosts all dependencies of each application (scripts, databases, and command line interface software). Implementation of the Server-Side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server-Side can be installed, via a Docker container, in the user’s PC under any operating system or on remote servers, as a cloud solution. “RNASeq” and “VariantSeq” are both available as desktop (RCP compilation) and web (RAP compilation) applications. Each application has two execution modes: a step-by-step mode enables each step of the workflow to be executed independently, and a pipeline mode allows all steps to be run sequentially. “RNASeq” and “VariantSeq” also feature an experimental, online support system called GENIE that consists of a virtual (chatbot) assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline jobs panel provides information about the status of each computational job executed in the GPRO Server-Side, while the expert system provides the user with a potential recommendation to identify or fix failed analyses. Our solution is a ready-to-use topic specific platform that combines the user-friendliness, robustness, and security of desktop software, with the efficiency of cloud/web applications to manage pipelines and workflows based on command line interface software.This work was supported by the Marie Sklodowska-Curie OPATHY project grant agreement 642095, the pre-doctoral research fellowship from MINECO Industrial Doctorates (Grant 659 DI-17-09134); Grant TSI-100903-2019-11 from the Secretary of State for Digital Advancement from Ministry of Economic Affairs and Digital Transformation, Spain; the Expedient IDI-2021-158274-a from the Ministry of Science and Innovation, Spain; and the ThinkInAzul program supported by MCIN with funding from European Union NextGenerationEU (PRTR-C17.I1) and Generalitat Valenciana (THINKINAZUL/2021/024).Peer Reviewed"Article signat per 18 autors/es: Ahmed Ibrahem Hafez, Beatriz Soriano, Aya Allah Elsayed,Ricardo Futami,Raquel Ceprian, Ricardo Ramos-Ruiz, Genis Martinez, Francisco Jose Roig, Miguel Angel Torres-Font, Fernando Naya-Catala, Josep Alvar Calduch-Giner, Lucia Trilla-Fuertes, Angelo Gamez Pozo, Vicente Arnau, Jose Maria Sempere-Luna, Jaume Perez-Sanchez, Toni Gabaldon and Carlos Llorens "Postprint (published version

    Client Applications and Server-Side Docker for Management of RNASeq and/or VariantSeq Workflows and Pipelines of the GPRO Suite

    Get PDF
    The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called 'RNASeq' and 'VariantSeq' to manage pipelines and workflows based on the most common command line interface tools for RNA-seq and Variant-seq analysis, respectively. As such, 'RNASeq' and 'VariantSeq' are coupled with a Linux server infrastructure (named GPRO Server-Side) that hosts all dependencies of each application (scripts, databases, and command line interface software). Implementation of the Server-Side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server-Side can be installed, via a Docker container, in the user's PC under any operating system or on remote servers, as a cloud solution. 'RNASeq' and 'VariantSeq' are both available as desktop (RCP compilation) and web (RAP compilation) applications. Each application has two execution modes: a step-by-step mode enables each step of the workflow to be executed independently, and a pipeline mode allows all steps to be run sequentially. 'RNASeq' and 'VariantSeq' also feature an experimental, online support system called GENIE that consists of a virtual (chatbot) assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline jobs panel provides information about the status of each computational job executed in the GPRO Server-Side, while the expert system provides the user with a potential recommendation to identify or fix failed analyses. Our solution is a ready-to-use topic specific platform that combines the user-friendliness, robustness, and security of desktop software, with the efficiency of cloud/web applications to manage pipelines and workflows based on command line interface software

    Designing a Robust and Portable Workflow for Detecting Genetic Variants Associated with Molecular Phenotypes Across Multiple Studies

    Get PDF
    Kvantitatiivse tunnuse lookusteks (quantitative trait locus, QTL) nimetatakse geneetilisi variante, millel on statistiline seos mõne molekulaarse tunnusega. QTL analüüs võimaldab paremini aru saada komplekshaiguseid ja tunnuseid mõjutavatest molekulaarsetest mehhanismidest. Tüüpiline QTL analüüs koosneb suurest hulgast sammudest, mille kõigi jaoks on olemas palju erinevaid tööriistu, kuid mida ei ole siiani kokku pandud ühte lihtsasti kasutatavasse, teisaldatavasse ning korratavasse töövoogu. Käesolevas töös loodud töövoog koosneb kolmest moodulist: huvipakkuva tunnuse kvantifitseerimine (i), andmete normaliseerimine ja kvaliteedikontroll (ii) ning QTL analüüs (iii). Kvantifitseerimise ja QTL analüüsi moodulite jaoks kasutasime Nextflow töövoo juhtimise süsteemi ning järgisime kõiki nf-core raamistiku parimaid praktikaid. Mõlemad töövoo moodulid on avatud lähekoodiga ning kasutavad tarkvarakonteinereid, mis võimaldab kasutajatel neid lihtsalt laiendada ning jooksutada erinevates arvutuskeskkondades. Kvaliteedikontrolli teostamiseks ning andmete normaliseerimiseks arendasime välja skripti, mis automaatselt arvutab välja erinevad kvaliteedimõõdikud ning esitab need kasutajale. Juhtprojekti raames viisime läbi geeniekspressiooni QTL analüüsi 15 andmestikus ja 40 erinevas bioloogilises kontekstis ning tuvastasime vähemalt ühe statistiliselt olulise QTLi enam kui 9000 geenile. Loodud töövoogude laialdasem kasutuselevõtt võimaldab muuta QTL analüüsi korratavamaks, teisaldatavamaks ning lihtsamini kasutatavaks.Quantitative trait locus (QTL) analysis links variations in molecular phenotype expression levels to genotype variation. This analysis has become a standard practice to better understand molecular mechanisms underlying complex traits and diseases. Typical QTL analysis consists of multiple steps. Although a diverse set of tools is available to perform these individual analysis, the tools have so far not been integrated into a reproducible and scalable workflow that is easy to use across a wide range computational environments. Our analysis workflow consists of three modules. The analysis starts with quantification of the phenotype of interest, proceeds with normalisation and quality control and finishes with the QTL analysis. For phenotype quantification and QTL mapping modules we developed pipelines following best practices of the nf-core framework. The pipelines are containerized, open-source, extensible and eligible to be parallelly executed in a variety computational environments. For quality control module we developed a script which automatically computes the measures of quality and provides user with information. As a proof of concept, we uniformly processed more than 40 context specific groups from more than 15 studies and discovered at least one significant eQTL for more than 9000 genes. We believe that adopting our pipelines will increase reproducibility, portability and robustness of QTL analysis in comparison to existing approaches

    Characterizing Human Transfer RNAS by Hydro-TRNASEQ and PAR-CLIP

    Get PDF
    The participation of tRNAs in fundamental aspects of biology and disease necessitates an accurate, experimentally confirmed annotation of tRNA genes, and curation of precursor and mature tRNA sequences. This has been challenging, mainly because RNA secondary structure and nucleotide modifications, together with tRNA gene multiplicity, complicate sequencing and read mapping efforts. To address these issues, I developed hydro-tRNAseq, a method based on partial alkaline RNA hydrolysis that generates fragments amenable for sequencing. To identify transcribed tRNA genes, I further complemented this approach with Photoactivatable Crosslinking and Immunoprecipitation (PAR-CLIP) of SSB/La, a conserved protein involved in pre-tRNA processing. My results show that approximately half of all predicted tRNA genes are transcribed in human cells, suggesting that the tRNA genomic space is more contracted than previously thought as a result of regulation of expression. I also report predominant nucleotide modification sites, their order of incorporation, and identify tRNA leader, trailer and intron sequences. By using complementary sequencing-based methodologies I present a human tRNA reference set, and determine expression levels of mature and processing intermediates of tRNAs in human cells. The technical advances provided by hydro-tRNAseq are applied towards the molecular diagnosis of a genetic neurodevelopmental syndrome, caused by mutations in the tRNA processing factor CLP1. Finally, I harness this novel experimental and computational expertise towards the identification of the endonuclease complex C3PO as a novel processing factor of human tRNAs. I carry out a transcriptome-wide analysis of C3PO targets, identify its binding sites and motifs, and provide insights into its biochemical and biological functions

    RNA‐seq: Applications and Best Practices

    Get PDF
    RNA‐sequencing (RNA‐seq) is the state‐of‐the‐art technique for transcriptome analysis that takes advantage of high‐throughput next‐generation sequencing. Although being a powerful approach, RNA‐seq imposes major challenges throughout its steps with numerous caveats. There are currently many experimental options available, and a complete comprehension of each step is critical to make right decisions and avoid getting into inconclusive results. A complete workflow consists of: (1) experimental design; (2) sample and library preparation; (3) sequencing; and (4) data analysis. RNA‐seq enables a wide range of applications such as the discovery of novel genes, gene/transcript quantification, and differential expression and functional analysis. This chapter will encompass the main aspects from sample preparation to downstream data analysis. It will be discussed how to obtain high‐quality samples, replicates amount, library preparation, sequencing platforms and coverage, focusing on best recommended practices based on specialized literature. Basic techniques and well‐known algorithms are presented and discussed, guiding both beginners and experienced users in the implementation of reliable experiments

    Predicting MHC I restricted T cell epitopes in mice with NAP-CNB, a novel online tool

    Get PDF
    Lack of a dedicated integrated pipeline for neoantigen discovery in mice hinders cancer immunotherapy research. Novel sequential approaches through recurrent neural networks can improve the accuracy of T-cell epitope binding affinity predictions in mice, and a simplified variant selection process can reduce operational requirements. We have developed a web server tool (NAP-CNB) for a full and automatic pipeline based on recurrent neural networks, to predict putative neoantigens from tumoral RNA sequencing reads. The developed software can estimate H-2 peptide ligands, with an AUC comparable or superior to state-of-the-art methods, directly from tumor samples. As a proof-of-concept, we used the B16 melanoma model to test the system's predictive capabilities, and we report its putative neoantigens. NAP-CNB web server is freely available at http://biocomp.cnb.csic.es/NeoantigensApp/ with scripts and datasets accessible through the download section

    Experiences with workflows for automating data-intensive bioinformatics

    Get PDF
    High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.Pubblicat
    corecore