1,532 research outputs found
Client applications and Server Side docker for management of RNASeq and/or VariantSeq workflows and pipelines of the GPRO Suite
The GPRO suite is an in-progress bioinformatic project for -omic data
analyses. As part of the continued growth of this project, we introduce a
client side & server side solution for comparative transcriptomics and analysis
of variants. The client side consists of two Java applications called "RNASeq"
and "VariantSeq" to manage workflows for RNA-seq and Variant-seq analysis,
respectively, based on the most common command line interface tools for each
topic. Both applications are coupled with a Linux server infrastructure (named
GPRO Server Side) that hosts all dependencies of each application (scripts,
databases, and command line interface tools). Implementation of the server side
requires a Linux operating system, PHP, SQL, Python, bash scripting, and
third-party software. The GPRO Server Side can be deployed via a Docker
container that can be installed in the user's PC using any operating system or
on remote servers as a cloud solution. The two applications are available as
desktop and cloud applications and provide two execution modes: a Step-by-Step
mode enables each step of a workflow to be executed independently and a
Pipeline mode allows all steps to be run sequentially. The two applications
also feature an experimental support system called GENIE that consists of a
virtual chatbot/assistant and a pipeline jobs panel coupled with an expert
system. The chatbot can troubleshoot issues with the usage of each tool, the
pipeline job panel provides information about the status of each task executed
in the GPRO Server Side, and the expert provides the user with a potential
recommendation to identify or fix failed analyses. The two applications and the
GPRO Server Side combine the user-friendliness and security of client software
with the efficiency of front-end & back-end solutions to manage command line
interface software for RNA-seq and variant-seq analysis via interface
environments
nsroot: Minimalist Process Isolation Tool Implemented With Linux Namespaces
Data analyses in the life sciences are moving from tools run on a personal
computer to services run on large computing platforms. This creates a need to
package tools and dependencies for easy installation, configuration and
deployment on distributed platforms. In addition, for secure execution there is
a need for process isolation on a shared platform. Existing virtual machine and
container technologies are often more complex than traditional Unix utilities,
like chroot, and often require root privileges in order to set up or use. This
is especially challenging on HPC systems where users typically do not have root
access. We therefore present nsroot, a lightweight Linux namespaces based
process isolation tool. It allows restricting the runtime environment of data
analysis tools that may not have been designed with security as a top priority,
in order to reduce the risk and consequences of security breaches, without
requiring any special privileges. The codebase of nsroot is small, and it
provides a command line interface similar to chroot. It can be used on all
Linux kernels that implement user namespaces. In addition, we propose combining
nsroot with the AppImage format for secure execution of packaged applications.
nsroot is open sourced and available at: https://github.com/uit-no/nsroo
Client applications and server-side docker for management of RNASeq and/or VariantSeq workflows and pipelines of the GPRO suite
The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called “RNASeq” and “VariantSeq” to manage pipelines and workflows based on the most common command line interface tools for RNA-seq and Variant-seq analysis, respectively. As such, “RNASeq” and “VariantSeq” are coupled with a Linux server infrastructure (named GPRO Server-Side) that hosts all dependencies of each application (scripts, databases, and command line interface software). Implementation of the Server-Side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server-Side can be installed, via a Docker container, in the user’s PC under any operating system or on remote servers, as a cloud solution. “RNASeq” and “VariantSeq” are both available as desktop (RCP compilation) and web (RAP compilation) applications. Each application has two execution modes: a step-by-step mode enables each step of the workflow to be executed independently, and a pipeline mode allows all steps to be run sequentially. “RNASeq” and “VariantSeq” also feature an experimental, online support system called GENIE that consists of a virtual (chatbot) assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline jobs panel provides information about the status of each computational job executed in the GPRO Server-Side, while the expert system provides the user with a potential recommendation to identify or fix failed analyses. Our solution is a ready-to-use topic specific platform that combines the user-friendliness, robustness, and security of desktop software, with the efficiency of cloud/web applications to manage pipelines and workflows based on command line interface software.This work was supported by the Marie Sklodowska-Curie OPATHY project grant agreement 642095, the pre-doctoral research fellowship from MINECO Industrial Doctorates (Grant 659 DI-17-09134); Grant TSI-100903-2019-11 from the Secretary of State for Digital Advancement from Ministry of Economic Affairs and Digital Transformation, Spain; the Expedient IDI-2021-158274-a from the Ministry of Science and Innovation, Spain; and the ThinkInAzul program supported by MCIN with funding from European Union NextGenerationEU (PRTR-C17.I1) and Generalitat Valenciana (THINKINAZUL/2021/024).Peer Reviewed"Article signat per 18 autors/es: Ahmed Ibrahem Hafez, Beatriz Soriano, Aya Allah Elsayed,Ricardo Futami,Raquel Ceprian, Ricardo Ramos-Ruiz, Genis Martinez, Francisco Jose Roig, Miguel Angel Torres-Font, Fernando Naya-Catala, Josep Alvar Calduch-Giner, Lucia Trilla-Fuertes, Angelo Gamez Pozo, Vicente Arnau, Jose Maria Sempere-Luna, Jaume Perez-Sanchez, Toni Gabaldon and Carlos Llorens "Postprint (published version
Client Applications and Server-Side Docker for Management of RNASeq and/or VariantSeq Workflows and Pipelines of the GPRO Suite
The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called 'RNASeq' and 'VariantSeq' to manage pipelines and workflows based on the most common command line interface tools for RNA-seq and Variant-seq analysis, respectively. As such, 'RNASeq' and 'VariantSeq' are coupled with a Linux server infrastructure (named GPRO Server-Side) that hosts all dependencies of each application (scripts, databases, and command line interface software). Implementation of the Server-Side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server-Side can be installed, via a Docker container, in the user's PC under any operating system or on remote servers, as a cloud solution. 'RNASeq' and 'VariantSeq' are both available as desktop (RCP compilation) and web (RAP compilation) applications. Each application has two execution modes: a step-by-step mode enables each step of the workflow to be executed independently, and a pipeline mode allows all steps to be run sequentially. 'RNASeq' and 'VariantSeq' also feature an experimental, online support system called GENIE that consists of a virtual (chatbot) assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline jobs panel provides information about the status of each computational job executed in the GPRO Server-Side, while the expert system provides the user with a potential recommendation to identify or fix failed analyses. Our solution is a ready-to-use topic specific platform that combines the user-friendliness, robustness, and security of desktop software, with the efficiency of cloud/web applications to manage pipelines and workflows based on command line interface software
On-premise containerized, light-weight software solutions for Biomedicine
Bioinformatics software systems are critical tools for analysing large-scale biological
data, but their design and implementation can be challenging due to the need for reliability, scalability, and performance. This thesis investigates the impact of several
software approaches on the design and implementation of bioinformatics software
systems. These approaches include software patterns, microservices, distributed
computing, containerisation and container orchestration. The research focuses on
understanding how these techniques affect bioinformatics software systems’ reliability, scalability, performance, and efficiency. Furthermore, this research highlights
the challenges and considerations involved in their implementation. This study also
examines potential solutions for implementing container orchestration in bioinformatics research teams with limited resources and the challenges of using container
orchestration. Additionally, the thesis considers microservices and distributed computing and how these can be optimised in the design and implementation process to
enhance the productivity and performance of bioinformatics software systems. The
research was conducted using a combination of software development, experimentation, and evaluation. The results show that implementing software patterns can
significantly improve the code accessibility and structure of bioinformatics software
systems. Specifically, microservices and containerisation also enhanced system reliability, scalability, and performance. Additionally, the study indicates that adopting
advanced software engineering practices, such as model-driven design and container
orchestration, can facilitate efficient and productive deployment and management of
bioinformatics software systems, even for researchers with limited resources. Overall, we develop a software system integrating all our findings. Our proposed system
demonstrated the ability to address challenges in bioinformatics. The thesis makes
several key contributions in addressing the research questions surrounding the design,
implementation, and optimisation of bioinformatics software systems using software
patterns, microservices, containerisation, and advanced software engineering principles and practices. Our findings suggest that incorporating these technologies can
significantly improve bioinformatics software systems’ reliability, scalability, performance, efficiency, and productivity.Bioinformatische Software-Systeme stellen bedeutende Werkzeuge für die Analyse
umfangreicher biologischer Daten dar. Ihre Entwicklung und Implementierung kann
jedoch aufgrund der erforderlichen Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit eine Herausforderung darstellen. Das Ziel dieser Arbeit ist es, die Auswirkungen von Software-Mustern, Microservices, verteilten Systemen, Containerisierung
und Container-Orchestrierung auf die Architektur und Implementierung von bioinformatischen Software-Systemen zu untersuchen. Die Forschung konzentriert sich
darauf, zu verstehen, wie sich diese Techniken auf die Zuverlässigkeit, Skalierbarkeit,
Leistungsfähigkeit und Effizienz von bioinformatischen Software-Systemen auswirken
und welche Herausforderungen mit ihrer Konzeptualisierungen und Implementierung
verbunden sind. Diese Arbeit untersucht auch potenzielle Lösungen zur Implementierung von Container-Orchestrierung in bioinformatischen Forschungsteams mit begrenzten Ressourcen und die Einschränkungen bei deren Verwendung in diesem Kontext. Des Weiteren werden die Schlüsselfaktoren, die den Erfolg von bioinformatischen Software-Systemen mit Containerisierung, Microservices und verteiltem Computing beeinflussen, untersucht und wie diese im Design- und Implementierungsprozess optimiert werden können, um die Produktivität und Leistung bioinformatischer
Software-Systeme zu steigern. Die vorliegende Arbeit wurde mittels einer Kombination aus Software-Entwicklung, Experimenten und Evaluation durchgeführt. Die
erzielten Ergebnisse zeigen, dass die Implementierung von Software-Mustern, die Zuverlässigkeit und Skalierbarkeit von bioinformatischen Software-Systemen erheblich
verbessern kann. Der Einsatz von Microservices und Containerisierung trug ebenfalls zur Steigerung der Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit des
Systems bei. Darüber hinaus legt die Arbeit dar, dass die Anwendung von SoftwareEngineering-Praktiken, wie modellgesteuertem Design und Container-Orchestrierung,
die effiziente und produktive Bereitstellung und Verwaltung von bioinformatischen
Software-Systemen erleichtern kann. Zudem löst die Implementierung dieses SoftwareSystems, Herausforderungen für Forschungsgruppen mit begrenzten Ressourcen. Insgesamt hat das System gezeigt, dass es in der Lage ist, Herausforderungen im Bereich
der Bioinformatik zu bewältigen und stellt somit ein wertvolles Werkzeug für Forscher in diesem Bereich dar. Die vorliegende Arbeit leistet mehrere wichtige Beiträge
zur Beantwortung von Forschungsfragen im Zusammenhang mit dem Entwurf, der
Implementierung und der Optimierung von Software-Systemen für die Bioinformatik unter Verwendung von Prinzipien und Praktiken der Softwaretechnik. Unsere
Ergebnisse deuten darauf hin, dass die Einbindung dieser Technologien die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit, Effizienz und Produktivität bioinformatischer Software-Systeme erheblich verbessern kann
On Designing Multicore-aware Simulators for Biological Systems
The stochastic simulation of biological systems is an increasingly popular
technique in bioinformatics. It often is an enlightening technique, which may
however result in being computational expensive. We discuss the main
opportunities to speed it up on multi-core platforms, which pose new challenges
for parallelisation techniques. These opportunities are developed in two
general families of solutions involving both the single simulation and a bulk
of independent simulations (either replicas of derived from parameter sweep).
Proposed solutions are tested on the parallelisation of the CWC simulator
(Calculus of Wrapped Compartments) that is carried out according to proposed
solutions by way of the FastFlow programming framework making possible fast
development and efficient execution on multi-cores.Comment: 19 pages + cover pag
Evaluating Bioinformatic Pipeline Performance for Forensic Microbiome Analysis*,†,‡
Microbial communities have potential evidential utility for forensic applications. However, bioinformatic analysis of high‐throughput sequencing data varies widely among laboratories. These differences can potentially affect microbial community composition and downstream analyses. To illustrate the importance of standardizing methodology, we compared analyses of postmortem microbiome samples using several bioinformatic pipelines, varying minimum library size or minimum number of sequences per sample, and sample size. Using the same input sequence data, we found that three open‐source bioinformatic pipelines, MG‐RAST, mothur, and QIIME2, had significant differences in relative abundance, alpha‐diversity, and beta‐diversity, despite the same input data. Increasing minimum library size and sample size increased the number of low‐abundant and infrequent taxa detected. Our results show that bioinformatic pipeline and parameter choice affect results in important ways. Given the growing potential application of forensic microbiology to the criminal justice system, continued research on standardizing computational methodology will be important for downstream applications.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154468/1/jfo14213_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154468/2/jfo14213.pd
- …