Search CORE

1,532 research outputs found

Bioinformatic pipelines in Python with Leaf

Author
Publication venue: BioMed Central
Publication date: 21/06/2013
Field of study

Client applications and Server Side docker for management of RNASeq and/or VariantSeq workflows and pipelines of the GPRO Suite

Author: Arnau Vicente
Calduch-Giner Josep Alvar
Ceprian Raquel
Elsayed Aya A.
Futami Ricardo
Gabaldón Toni
Gamez-Pozo Angelo
Hafez Ahmed
Llorens Carlos
Martinez Genis
Naya-Català Fernando
Perez-Sánchez Jaume
Ramos-Ruiz Ricardo
Roig Francisco J.
Sempere Jose M.
Soriano Beatriz
Torres-Font Miguel A.
Trilla-Fuertes Lucia
Publication venue
Publication date: 19/11/2022
Field of study

The GPRO suite is an in-progress bioinformatic project for -omic data analyses. As part of the continued growth of this project, we introduce a client side & server side solution for comparative transcriptomics and analysis of variants. The client side consists of two Java applications called "RNASeq" and "VariantSeq" to manage workflows for RNA-seq and Variant-seq analysis, respectively, based on the most common command line interface tools for each topic. Both applications are coupled with a Linux server infrastructure (named GPRO Server Side) that hosts all dependencies of each application (scripts, databases, and command line interface tools). Implementation of the server side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server Side can be deployed via a Docker container that can be installed in the user's PC using any operating system or on remote servers as a cloud solution. The two applications are available as desktop and cloud applications and provide two execution modes: a Step-by-Step mode enables each step of a workflow to be executed independently and a Pipeline mode allows all steps to be run sequentially. The two applications also feature an experimental support system called GENIE that consists of a virtual chatbot/assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline job panel provides information about the status of each task executed in the GPRO Server Side, and the expert provides the user with a potential recommendation to identify or fix failed analyses. The two applications and the GPRO Server Side combine the user-friendliness and security of client software with the efficiency of front-end & back-end solutions to manage command line interface software for RNA-seq and variant-seq analysis via interface environments

arXiv.org e-Print Archive

Directory of Open Access Journals

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

UPF Digital Repository

Digital.CSIC

nsroot: Minimalist Process Isolation Tool Implemented With Linux Namespaces

Author: Bongo Lars Ailo
Fjukstad Bjørn
Raknes Inge Alexander
Publication venue
Publication date: 13/09/2016
Field of study

Data analyses in the life sciences are moving from tools run on a personal computer to services run on large computing platforms. This creates a need to package tools and dependencies for easy installation, configuration and deployment on distributed platforms. In addition, for secure execution there is a need for process isolation on a shared platform. Existing virtual machine and container technologies are often more complex than traditional Unix utilities, like chroot, and often require root privileges in order to set up or use. This is especially challenging on HPC systems where users typically do not have root access. We therefore present nsroot, a lightweight Linux namespaces based process isolation tool. It allows restricting the runtime environment of data analysis tools that may not have been designed with security as a top priority, in order to reduce the risk and consequences of security breaches, without requiring any special privileges. The codebase of nsroot is small, and it provides a command line interface similar to chroot. It can be used on all Linux kernels that implement user namespaces. In addition, we propose combining nsroot with the AppImage format for secure execution of packaged applications. nsroot is open sourced and available at: https://github.com/uit-no/nsroo

arXiv.org e-Print Archive

BIBSYS: Open Journals Systems

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Client applications and server-side docker for management of RNASeq and/or VariantSeq workflows and pipelines of the GPRO suite

Author: Ceprian Ricardo
Elsayed Aya Allah
Futami Ricardo
Gabaldón Toni
Hafez Ahmed Ibrahem
Soriano Beatriz
Publication venue: MDPI
Publication date: 01/01/2023
Field of study

The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called “RNASeq” and “VariantSeq” to manage pipelines and workflows based on the most common command line interface tools for RNA-seq and Variant-seq analysis, respectively. As such, “RNASeq” and “VariantSeq” are coupled with a Linux server infrastructure (named GPRO Server-Side) that hosts all dependencies of each application (scripts, databases, and command line interface software). Implementation of the Server-Side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server-Side can be installed, via a Docker container, in the user’s PC under any operating system or on remote servers, as a cloud solution. “RNASeq” and “VariantSeq” are both available as desktop (RCP compilation) and web (RAP compilation) applications. Each application has two execution modes: a step-by-step mode enables each step of the workflow to be executed independently, and a pipeline mode allows all steps to be run sequentially. “RNASeq” and “VariantSeq” also feature an experimental, online support system called GENIE that consists of a virtual (chatbot) assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline jobs panel provides information about the status of each computational job executed in the GPRO Server-Side, while the expert system provides the user with a potential recommendation to identify or fix failed analyses. Our solution is a ready-to-use topic specific platform that combines the user-friendliness, robustness, and security of desktop software, with the efficiency of cloud/web applications to manage pipelines and workflows based on command line interface software.This work was supported by the Marie Sklodowska-Curie OPATHY project grant agreement 642095, the pre-doctoral research fellowship from MINECO Industrial Doctorates (Grant 659 DI-17-09134); Grant TSI-100903-2019-11 from the Secretary of State for Digital Advancement from Ministry of Economic Affairs and Digital Transformation, Spain; the Expedient IDI-2021-158274-a from the Ministry of Science and Innovation, Spain; and the ThinkInAzul program supported by MCIN with funding from European Union NextGenerationEU (PRTR-C17.I1) and Generalitat Valenciana (THINKINAZUL/2021/024).Peer Reviewed"Article signat per 18 autors/es: Ahmed Ibrahem Hafez, Beatriz Soriano, Aya Allah Elsayed,Ricardo Futami,Raquel Ceprian, Ricardo Ramos-Ruiz, Genis Martinez, Francisco Jose Roig, Miguel Angel Torres-Font, Fernando Naya-Catala, Josep Alvar Calduch-Giner, Lucia Trilla-Fuertes, Angelo Gamez Pozo, Vicente Arnau, Jose Maria Sempere-Luna, Jaume Perez-Sanchez, Toni Gabaldon and Carlos Llorens "Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Client Applications and Server-Side Docker for Management of RNASeq and/or VariantSeq Workflows and Pipelines of the GPRO Suite

Author: Arnau Llombart Arnau
Calduch-Giner Josep Alvar
Ceprián Raquel
Elsayed Aya Allah
Futami Ricardo
Gabaldón Toni
Gámez-Pozo Ángelo
Hafez Ahmed Ibrahem
Llorens Carlos
Martínez Genís
Naya Catala Fernando
Pérez-Sánchez Jaume
Ramos-Ruiz Ricardo
Roig Francisco Jose
Sempere-Luna Jose Maria
Soriano Beatriz
Torres-Font Miguel Ángel
Trilla-Fuertes Lucia
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

UPF Digital Repository

Digital.CSIC

On-premise containerized, light-weight software solutions for Biomedicine

Author: Le Duc Huy
Publication venue
Publication date: 01/01/2023
Field of study

Bioinformatics software systems are critical tools for analysing large-scale biological data, but their design and implementation can be challenging due to the need for reliability, scalability, and performance. This thesis investigates the impact of several software approaches on the design and implementation of bioinformatics software systems. These approaches include software patterns, microservices, distributed computing, containerisation and container orchestration. The research focuses on understanding how these techniques affect bioinformatics software systems’ reliability, scalability, performance, and efficiency. Furthermore, this research highlights the challenges and considerations involved in their implementation. This study also examines potential solutions for implementing container orchestration in bioinformatics research teams with limited resources and the challenges of using container orchestration. Additionally, the thesis considers microservices and distributed computing and how these can be optimised in the design and implementation process to enhance the productivity and performance of bioinformatics software systems. The research was conducted using a combination of software development, experimentation, and evaluation. The results show that implementing software patterns can significantly improve the code accessibility and structure of bioinformatics software systems. Specifically, microservices and containerisation also enhanced system reliability, scalability, and performance. Additionally, the study indicates that adopting advanced software engineering practices, such as model-driven design and container orchestration, can facilitate efficient and productive deployment and management of bioinformatics software systems, even for researchers with limited resources. Overall, we develop a software system integrating all our findings. Our proposed system demonstrated the ability to address challenges in bioinformatics. The thesis makes several key contributions in addressing the research questions surrounding the design, implementation, and optimisation of bioinformatics software systems using software patterns, microservices, containerisation, and advanced software engineering principles and practices. Our findings suggest that incorporating these technologies can significantly improve bioinformatics software systems’ reliability, scalability, performance, efficiency, and productivity.Bioinformatische Software-Systeme stellen bedeutende Werkzeuge für die Analyse umfangreicher biologischer Daten dar. Ihre Entwicklung und Implementierung kann jedoch aufgrund der erforderlichen Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit eine Herausforderung darstellen. Das Ziel dieser Arbeit ist es, die Auswirkungen von Software-Mustern, Microservices, verteilten Systemen, Containerisierung und Container-Orchestrierung auf die Architektur und Implementierung von bioinformatischen Software-Systemen zu untersuchen. Die Forschung konzentriert sich darauf, zu verstehen, wie sich diese Techniken auf die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit und Effizienz von bioinformatischen Software-Systemen auswirken und welche Herausforderungen mit ihrer Konzeptualisierungen und Implementierung verbunden sind. Diese Arbeit untersucht auch potenzielle Lösungen zur Implementierung von Container-Orchestrierung in bioinformatischen Forschungsteams mit begrenzten Ressourcen und die Einschränkungen bei deren Verwendung in diesem Kontext. Des Weiteren werden die Schlüsselfaktoren, die den Erfolg von bioinformatischen Software-Systemen mit Containerisierung, Microservices und verteiltem Computing beeinflussen, untersucht und wie diese im Design- und Implementierungsprozess optimiert werden können, um die Produktivität und Leistung bioinformatischer Software-Systeme zu steigern. Die vorliegende Arbeit wurde mittels einer Kombination aus Software-Entwicklung, Experimenten und Evaluation durchgeführt. Die erzielten Ergebnisse zeigen, dass die Implementierung von Software-Mustern, die Zuverlässigkeit und Skalierbarkeit von bioinformatischen Software-Systemen erheblich verbessern kann. Der Einsatz von Microservices und Containerisierung trug ebenfalls zur Steigerung der Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit des Systems bei. Darüber hinaus legt die Arbeit dar, dass die Anwendung von SoftwareEngineering-Praktiken, wie modellgesteuertem Design und Container-Orchestrierung, die effiziente und produktive Bereitstellung und Verwaltung von bioinformatischen Software-Systemen erleichtern kann. Zudem löst die Implementierung dieses SoftwareSystems, Herausforderungen für Forschungsgruppen mit begrenzten Ressourcen. Insgesamt hat das System gezeigt, dass es in der Lage ist, Herausforderungen im Bereich der Bioinformatik zu bewältigen und stellt somit ein wertvolles Werkzeug für Forscher in diesem Bereich dar. Die vorliegende Arbeit leistet mehrere wichtige Beiträge zur Beantwortung von Forschungsfragen im Zusammenhang mit dem Entwurf, der Implementierung und der Optimierung von Software-Systemen für die Bioinformatik unter Verwendung von Prinzipien und Praktiken der Softwaretechnik. Unsere Ergebnisse deuten darauf hin, dass die Einbindung dieser Technologien die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit, Effizienz und Produktivität bioinformatischer Software-Systeme erheblich verbessern kann

Institutional Repository of the Freie Universität Berlin

On Designing Multicore-aware Simulators for Biological Systems

Author: Aldinucci Marco
Coppo Mario
Damiani Ferruccio
Drocco Maurizio
Torquati Massimo
Troina Angelo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/10/2010
Field of study

The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Institutional Research Information System University of Turin

Evaluating Bioinformatic Pipeline Performance for Forensic Microbiome Analysis*,†,‡

Author: Benbow Mark E.
Jordan Heather R.
Kaszubinski Sierra F.
Meek Mariah H.
Pechal Jennifer L.
Schmidt Carl J.
Publication venue: 'Wiley'
Publication date: 01/03/2020
Field of study

Microbial communities have potential evidential utility for forensic applications. However, bioinformatic analysis of high‐throughput sequencing data varies widely among laboratories. These differences can potentially affect microbial community composition and downstream analyses. To illustrate the importance of standardizing methodology, we compared analyses of postmortem microbiome samples using several bioinformatic pipelines, varying minimum library size or minimum number of sequences per sample, and sample size. Using the same input sequence data, we found that three open‐source bioinformatic pipelines, MG‐RAST, mothur, and QIIME2, had significant differences in relative abundance, alpha‐diversity, and beta‐diversity, despite the same input data. Increasing minimum library size and sample size increased the number of low‐abundant and infrequent taxa detected. Our results show that bioinformatic pipeline and parameter choice affect results in important ways. Given the growing potential application of forensic microbiology to the criminal justice system, continued research on standardizing computational methodology will be important for downstream applications.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154468/1/jfo14213_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154468/2/jfo14213.pd

Crossref

Deep Blue Documents at the University of Michigan