10 research outputs found

    GO-Docker: Batch scheduling with containers

    Get PDF
    International audienceLightweight virtualization technologies gained attention by offering performance and effective scalability across cloud and physical architecture. GO-Docker is a new open source batch scheduling tool that provides container support (Docker). It is based on proven technologies and tools to provide job isolation and custom images for user jobs.Its architecture scales to handle large configurations and provides end-user easy access with a Web UI, CLI tools and API access for external programs integration.Containers provide job isolation, preventing resources overlap, and easier management for the cluster administrators. For the end-user, it provides a choice of operating systems, pre-built configurations and possible root access to the container.Its plugin architecture eases the integration of new scheduling algorithms or other execution/control mechanisms.The software targets multi-user systems with a central authentication (ldap, ...) and shared storage (home directory, shared data, etc.) and manages Docker access for users, leveraging security concerns with container access

    Colib'read on galaxy : a tools suite dedicated to biological information extraction from raw NGS reads

    Get PDF
    Background: With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. Findings: Dedicated to 'whole-genome assembly-free' treatments, the Colib'read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. Conclusions: With the Colib'read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.Peer reviewe

    Tutorial : Initiation Ă  la plateforme web Galaxy

    No full text
    Support de formation présentant la plateforme web d'analyse de données NGS Galax

    Formation Intégration d'outils dans la plateforme web GALAXY version 2015

    No full text
    Presentation\ud Biological data analysys requires more use of IT resources and environments often hard to understand for Biologists.\ud \ud This training session presents the use of Galaxy environment for programmers\ud \ud Goals\ud Galaxy tools integration principles. Understand Galaxy filesystem organization, write xml descriptors and python wrappers, start a new tool shed instance and use it in the dev processes .\ud \ud Organisation\ud The training session is composed by a short theoretical presentation..

    Integrating GALAXY workflows in a metadata management environment

    No full text
    International audienceThe Galaxy platform offers repositories of user data and related analysis processes (data histories and workflows). These repertories enable traceability and reproducibility of the processes within the platform. At a larger scale, to answer questions like "What protocol was used to analyze my data?" or "how were these data generated?", we could consider any protocol as a metadata set that annotates inputs and results.We present a preliminary approach for integrating the GALAXY workflows in an extensible meta-data management environment.Using ISA-tools, we have developed a formalism to describe an abstraction of data processing workflows. This specification, in the ISA-TAB format is named ISA-DATAFLOW.A conversion tool extracts a structured dataflow representation in GRAPHML, a generic XML graph format, from GALAXY workflows. This intermediary format can then be normalized using controlled vocabularies and converted into ISA-TAB following our ISA-DATAFLOW specification.We plan to integrate this work to propose advanced research functionalities within a virtual research environment (VRE) deployed on a geographically and thematically distributed infrastructure already using multiple Galaxy instances. Future developments will concern workflow meta-analysis and workflow composition assistance

    Automatic update of reference data in Galaxy using BioMAJ.

    Get PDF
    International audienceMany bioinformatic tools require the use of reference data like genome assemblies or sequence databanks.Galaxy offers multiple ways to give access to this data in its web interface: data libraries, *.loc files and more recently the introcution of data managers.However, until now, the process of adding a new reference data was essentially manual and time consuming, even more when this data need to be indexed in avariety of formats (blast, bowtie, bwa, 2bit, ...).The recent release of data managers is a first step for the automation of data download and indexing, but it still requires some manual intervention to launchthe download and subsequent automatic indexing. Furthermore, it was designed with a galaxy-centric view, not taking into account that reference data are oftenused outside Galaxy, for example using command line or concurrent systems like Mobyle.BioMAJ is a widely used and stable software designed to automate the download and transformation of data from various sources. This data can be used directlyfrom the command line, or in more complex systems like Mobyle, or using a REST API. We have developed BioMAJ post-processes to automatically populate the Galaxydata libraries or data managers, avoiding data and transformation duplications.In this talk we will make a brief overview of the difference way to manage reference data in Galaxy. We will then present the solution that was developed tofill the gap between BioMAJ and Galaxy. We will then present some considerations in regard to security aspects when a reference data needs to be available onlyto a group of users. Finally on-going developments and ideas will be evoked

    BioShaDock: a community driven bioinformatics shared Docker-based tools registry [version 1; referees: 2 approved]

    Get PDF
    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientiïŹc software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difïŹcult for a bioinformatics user to ïŹnd the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts deïŹned in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user deïŹned tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community

    A curated Domain centric shared Docker registry linked to the Galaxy toolshed

    Get PDF
    International audienceNowadays, Docker containers are used to ease application deployment, from command lines tools to cluster management1. This technology has a strong impact in bioinformatics where specialized software can often require multiple dependencies. It is a long term preservation solution for legacy and unmaintained tools and it enables a better process isolation in a multi-user environment. Docker as a way to quickly integrate new tools is already used with Galaxy. We have setup a functional prototype of a web registry of Docker images, BioShaDock,2 dedicated to bioinformatics tools and utilities. We created a set of tools descriptors based on Docker images available in our toolshed3. Even if a general purpose registry can be used to hold shared Docker containers, we think that a domain centric registry, e.g. for the French life science community through a registry linked to the cloud of the French Institute of Bioinformatics (IFB8), would have a significant impact on bioinformatician productivity and help to spread best practices. With a clear open source and domain orientation, it could federate container providers4,5 more easily. It would also be able to include validation and curation to eliminate redundant tools, organize versioning and standardize documentation. Future works will concern advanced searching capabilities, possible referencing within the ELIXIR Tools and Data Services Registry6 and in the IFB one (as the ELIXIR French node). We want also to contribute to standardize containers7 and evaluate if benchmarks5 could be produced from a meta-data enriched, Docker registry.References:1 Google Kubernetes, Docker container cluster management : kubernetes.io2 BioShaDock, a Bioinformatics Shared Docker registry : http://docker-ui.genouest.org3 GUGGO Galaxy Tooshed : http://toolshed.genouest.org4 Hexabio Docker repository : http://biodocker.github.io5 Nucleotid.es, continuous, objective and reproducible evaluation of genome assemblers using docker containers : http://nucleotid.es6 ELIXIR Tools and Data Services Registry : https://elixir-registry.cbs.dtu.dk7 Bioboxes, a standard for creating interchangable bioinformatics software containers : http://bioboxes.org8 IFB academic Cloud : http://www.france-bioinformatique.fr/?q=en/core/e-infrastructure-team/ifb-clou
    corecore