Search CORE

36 research outputs found

GO-Docker: Batch scheduling with containers

Author: Monjeaud Cyril
Sallou Olivier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2015
Field of study

International audienceLightweight virtualization technologies gained attention by offering performance and effective scalability across cloud and physical architecture. GO-Docker is a new open source batch scheduling tool that provides container support (Docker). It is based on proven technologies and tools to provide job isolation and custom images for user jobs.Its architecture scales to handle large configurations and provides end-user easy access with a Web UI, CLI tools and API access for external programs integration.Containers provide job isolation, preventing resources overlap, and easier management for the cluster administrators. For the end-user, it provides a choice of operating systems, pre-built configurations and possible root access to the container.Its plugin architecture eases the integration of new scheduling algorithms or other execution/control mechanisms.The software targets multi-user systems with a central authentication (ldap, ...) and shared storage (home directory, shared data, etc.) and manages Docker access for users, leveraging security concerns with container access

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

The ReproGenomics Viewer: an integrative cross-species toolbox for the reproductive science community.

Author: Becker Emmanuelle
Chalmel Frédéric
Collin Olivier
Darde Thomas A
Evrard Bertrand
Jégou Bernard
Le Bras Yvan
Monjoeaud Cyril
Rolland Antoine D.
Sallou Olivier
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/04/2015
Field of study

International audienceWe report the development of the ReproGenomics Viewer (RGV), a multi-and cross-species working environment for the visualization, mining and comparison of published omics data sets for the reproductive science community. The system currently embeds 15 published data sets related to gametogenesis from nine model organisms. Data sets have been curated and conveniently organized into broad categories including biological topics, technologies, species and publications. RGV's modular design for both organisms and genomic tools enables users to upload and compare their data with that from the data sets embedded in the system in a cross-species manner. The RGV is freely available at http://rgv.genouest.org

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Community-driven development for computational biology at Sprints, Hackathons and Codefests

Author: Afgan Enis
Banck Michael
Bonnal Raoul JP
Booth Timothy
Chapman Brad A
Chilton John
Cock Peter JA
Guimera Roman Valls
Gumbel Markus
Harris Nomi
Holland Richard
Kaján László
Kalaš Matúš
Katayama Toshiaki
Kibukawa Eri
Möller Steffen
Powel David R
Prins Pjotr
Quinn Jacqueline
Sallou Olivier
Seemann Torsten
Sloggett Clare
Soiland-Reyes Stian
Spooner William
Steinbiss Sascha
Strozzi Francesco
Tille Andreas
Travis Anthony J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects

Aberdeen University Research

University of Bergen

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

NORA - Norwegian Open Research Archives

University of Melbourne Institutional Repository

NERC Open Research Archive

The BioMart community portal: an innovative alternative to large, centralized data repositories.

Author: Allen James
Arnaiz Olivier
Awedh Mohammad
Baldock Richard
Barbiera Giulia
Bardou Philippe
Beck Tim
Blake Andrew
Bonierbale Meredith
Brookes Anthony
Bucci Gabrielle
Bueti Iwan
Burge Sarah
Cabua Cédric
Carlson Joseph
Chelala Claude
Chrysostomou Charalambos
Citaro Davide
Collin Olivier
Cordova Raul
Cutts Rosalind
Dassi Erik
Di Genova Alex
Djari Anis
Durinck Steffen
Esposito Anthony
Estrella Heather
Eyras Eduardo
Fernandez-Banet Julia
Forbes Simon
Free Robert
Fujisawa Takamoto
Gadaleta Emanuela
Garcia-Manteiga Jose
Goodstein David
Gray Kristian
Guerra-Assunção José
Haggarty Bernard
Haider Syed
Han Byung
Han Dong-Jin
Harris Todd
Harshbarger Jayson
Hastings Robert
Hayes Richard
Hoede Claire
Hu Shen
Hu Zhi-Liang
Hutchins Lucie
Kan Zhengyan
Kasprzyk Arek
Kawaji Hideya
Keliet Aminah
Kerhornou Arnaud
Kim Sunghoon
Kinsella Rhoda
Klopp Christophe
Kong Lei
Lawson Daniel
Lazarevic Dejan
Lee Ji-Hyun
Letellier Thomas
Li Chuan-Yun
Lio Pietro
Liu Chu-Jun
Luo Jie
Maass Alejandro
Mariette Jerome
Maurel Thomas
Merella Stefania
Mohamed Azza
Moreews Francois
Nabihoudine Ibounyamine
Ndegwa Nelson
Noirot Céline
Pandini Luca
Perez-Llamas Cristian
Primig Michael
Provero Paolo
Quattrone Alessandro
Quesneville Hasi
Rambaldi Davide
Reecy James
Reecy James
Riba Michela
Rosanoff Steven
Saddiq Amna
Salas Elise
Sallou Olivier
Shepherd Rebecca
Simon Reinhard
Smedley Damian
Sperling Linda
Spooner William
Staines Daniel
Steinbach Delphine
Stone Kevin
Stupka Elia
Teague Jon
Ullah Abu
Wang Jun
Ware Doreen
Wong-Erasmus Marie
Youens-Clark Ken
Zadissa Amonida
Zhang Shi-Jian
Publication venue: Nucleic Acids Res
Publication date: 01/01/2015
Field of study

The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations

HAL-CentraleSupelec

The Jackson Laboratory: The Mouseion at the JAXlibrary

Cold Spring Harbor Laboratory Institutional Repository

HAL Descartes

UPF Digital Repository

ProdInra

Hal-Diderot

Digital Repository @ Iowa State University (ISU)

Crossref

INRIA a CCSD electronic archive server

HAL-Inserm

PubMed Central

eScholarship - University of California

CGSpace

Apollo (Cambridge)

Institutional Research Information System University of Turin

HAL-Rennes 1

Leicester Research Archive

An application suite based on the IFB Container as a Service platform

Author: Collin Olivier
Moreews François
Sallou Olivier
Publication venue: HAL CCSD
Publication date: 03/09/2016
Field of study

International audienceIFB, the French Elixir Node, is a national service infrastructure which provides services and resources in bioinformatics[1] . IFB’s goal is to offer to scientific users and developers a scalable, flexible and user-friendly computation facility associated to a large storage capacity, as needed for current life science data processing. To analyze heterogeneous biological data, bioinformaticians require hundreds of different specialized software including well-established tools as well as research prototypes. In addition, these software are used alone or in workflows, from GUI or command lines, for production, tests or developments. Thus, providing an updated and complete set of tools requires huge resources. To offer an efficient service for this expected diversity of usages, we propose a software architecture and a cloud model which bring solutions for tools packaging, rapid deployment and multiple channel software distribution. We describe here the set of technical components that we built to enable a Container as a Service Model (CaaS) adapted to a bioinformatics academic cloud facility. BioShaDock BioShaDock[2] is the community based container registry for bioinformatics of the French bioinformatics Institute. It focuses on reproducibility in bioinformatics tools or pipelines using Docker containers. Containers are automatically build in background with security scans and meta data extraction. Meta-data can include general information but also ontologies terms. The BioShaDock registry already provides a large catalog of tools direcly from users, or project like Bioconda or Debian. The registry is open source and can be used by anyone, it is accessible by any Docker or rkt client. Computer scientists and bioinformaticians can more easily disseminate their programs and find potential users using a dedicated domain-centric Docker registry. There is a wide range of possible uses for container registries in bioinformatics: repositories managed at a community level, based on tools embedded in containers, allow users to exchange and replicate data analyses. GO-Docker GO-Docker[3] is a batch computing/cluster management tool using Docker as an execution/isolation system. It is dedicated to containers and has both a command line client and a web front end. It uses Docker Swarm and Apache Mesos and is compatible with google Kubernetes. A common concern regarding containers solution for cloud or HPC is related to potential security issues. First of all, we should remind that Docker implements the Linux Kernel cgroups feature and it can be used to isolate resource usage by users. Furthermore, we implemented SSL certificate and LDAP authentication in the GO-Docker Rest API prior to allow access to the job scheduler that manage the nodes where containers can be run. In addition, depending on the facility audience and exposure, an even safer solution can be obtained by using virtualized computation nodes. Developers used to command line can exploit the Go-Docker CLI that emulates classical scheduler commands. GO-Docker has a rich Rest API used in by clients. The clients (PYTHON or JAVA ) can be used in script or SaaS front end. Galaxy to Docker Galaxy is a widely adopted user-friendly web front-end for biological data processing. It provides powerful functionalities to enhance data analysis accessibility and reproducibility. It currently suits well the integration of existing command line tools and offers a large collection of bioinformatics software. However, the integration of each software needs the manual off-line creation of XML descriptor and sometimes additional wrappers: it is still a technical and time-consuming task. We propose to by-pass this limitation by enabling the direct execution of command line within any Ad Hoc container from a trusted repository like BioShaDock using the GO-Docker python API. This Galaxy to Docker component allows to create and use new “on demand tools” in a Galaxy instance without being an administrator and without need for coding. Accordingly, advanced users can easily and quickly include custom developments in their data analysis pipelines. This results in a more flexible Galaxy environment. D4WP The D4 workflow portal (D4WP) [4] is an advanced SaaS developer oriented environment for rapid tool and workflow design. It allows online graphical workflow and component authoring. Any command line tool and script are quickly captured and integrated using a full WYSIWYG approach. All workflow component dependencies can be defined as containers using an URI syntax. In this way a re-executable and self-contained workflow specification can be produced. D4WP integrates a GO-Docker scheduler API. From a unique specification, code generation can be used to target different languages to maximize potential workflow usage and dissemination. Current developments focus on Galaxy tool generation and Common Workflow Language export. The presented software components allow the creation of reproducible and flexible data analysis environments for different audiences (end users and developers) and multiple purposes (production data analysis, benchmark, workflow, tool and method development, dissemination, article publishing…) All tools embedded in containers, made available in BioShaDock and scheduled with GO-Docker are directly usable in Galaxy, D4WP and command line. We think that such an architecture limits deployment overhead and software integration cost and therefore accelerates the transfer of bioinformatics research output to production computation facilities. In a context of massive biological data production, the CaaS model offer interesting prospects. Thus, when data movement is limited by network capacity, deploying the whole CaaS environment on data production nodes may be a pragmatic solution. Furthermore, the suite of software components we presented here are developed to fit the long-term objective of the creation of a federation of interoperable clouds. Future works will include dissemination related features and compatibility and standardization effort.References1. IFB cloud: The academic cloud of the French Institute of Bioinformatics. http://www.france-bioinformatique.fr/2. Moreews F, Sallou O, Ménager H et al. BioShaDock: a community driven bioinformatics shared Docker-based tools registry. F1000Research 20153. Sallou O, Monjeaud C: GO-Docker: Batch scheduling with containers. IEEE Cluster 2015. 2015.4. Moreews F: Design and share data analysis workflows. Application to bioinformatics intensive treatments. Thesis, université de Rennes 1. 2015. http://workflow.genouest.or

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Hadoopizer : a cloud environment for bio-informatics data analysis

Author: Bretaudeau Anthony
Collin Olivier
Sallou Olivier
Publication venue: HAL CCSD
Publication date: 01/10/2012
Field of study

Biology is evolving into a big data science, particularly with the new sequencing technologies which have emerged during the last years. Cloud computing appears as one of the answers to face the rapidly increasing volume of bioinformatics data. Here we present a private cloud environment deployed on the GenOuest bioinformatics platform. After an overview of the software publicly available for bioinformatics treatments in the cloud, we present a new framework (Hadoopizer) which is a generic tool for the parallelisation of bioinformatics analysis in the cloud using the MapReduce paradigm. These developments are available online at this address: http://genocloud.genouest.or

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recherche d'instances de motifs expressifs avec Logol. Application à la modélisation d'événements de frameshift -1

Author: Belleannée Catherine
Nicolas Jacques
Sallou Olivier
Publication venue: HAL CCSD
Publication date: 03/07/2012
Field of study

The current practice of pattern matching tools and the gap that may be observed with the actual modelling needs of people analysing genome structures clearly demonstrates the need for higher level languages to describe and search for these structures in genomic sequences. It appears necessary to offer new tools allowing to build more expressive models of families of biological sequences, on the basis of their content and structure. This article presents Logol, a new application designed to achieve pattern matching in possibly large sequences with realistic biological motifs. Logol consists in both a language for describing patterns, and the associated parser for effectively scanning sequences (RNA, DNA or protein) with such motifs. The language, based on an high level gramatical formalism, allows to express flexible patterns (with misparings and indels) composed of both sequential and structural elements (such as repeats or pseudoknots). A web page on the GenOuest BioInformatics Platform http://www.genouest.org/ gives access to the Logol application. It includes an interface for graphically drawing the motif model and an interface to display the resulting matches within the targetted pattern. Logol is presented through an illustrative application using a quite intricate motif model, which is the detection of -1 ribosomal frameshifting events in messenger RNA sequences.L'état de la pratique des outils de reconnaissance de motifs et l'écart qui peut être observé avec les besoins réels de modélisation des personnes en charge de l'analyse des structures génomiques montrent clairement le besoin de langages de plus haut niveau pour décrire et rechercher ces structures dans les séquences génomiques. Il apparaît ainsi nécessaire de proposer de nouveaux outils permettant de définir des modèles expressifs de familles de séquences biologiques, modèles basés à la fois sur le contenu et la structure des séquences. Cet article présente Logol, une application de reconnaissance de motifs conçue pour analyser des séquences potentiellement grandes avec des motifs biologiques réalistes. Logol est constitué d'un langage de description de motifs et de la suite logicielle associée, permettant de réaliser effectivement l'analyse de séquences (d'ADN, ARN ou protéines) avec ces motifs. Le langage, basé sur un formalisme grammatical de haut niveau, permet d'exprimer des motifs flexibles (autorisant substitutions et indels) composés à la fois d'éléments de séquences et de structures (tels que des répétitions ou des pseudo-noeuds). La suite logicielle est accessible sur le web, sur la plateforme bioinformatique GenOuest http://www.genouest.org/. Elle contient notamment deux interfaces, l'une pour dessiner graphiquement le modèle de motif et la seconde pour afficher les résultats comme des instances de ce modèle. Logol est présenté au travers d'une application illustrant les concepts utiles via l'utilisation d'un modèle de motif assez riche. Il s'agit de la détection d'événements de décalage de phase en -1 dans les ARN messagers

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Seqcrawler: biological data indexing and browsing platform.

Author: Bretaudeau Anthony
Roult Aurelien
Sallou Olivier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2012
Field of study

International audienceABSTRACT: BACKGROUND: Seqcrawler takes its roots in software like SRS or Lucegene. It provides an indexing platform to ease the search of data and meta-data in biological banks and it can scale to face the current flow of data. While many biological bank search tools are available on the Internet, mainly provided by large organizations to search in their data, there is a lack of free and open source solution to browse one own set of data with a flexible query system and able to scale from single computer to a cloud system. A personal index platform will help labs and bioinformaticians to search in their meta-data but also to build a larger information system with custom subsets of data. RESULTS: The software is scalable from a single computer to a cloud-based infrastructure. It has been successfully tested in a private cloud with 3 index shards (piece of index) hosting ~400 millions of sequence information (whole GenBank, UniProt, PDB and others) for a total size of 600 GB in a fault tolerant architecture (high-availability). It has also been successfully integrated with software to add extra meta-data from blast results to enhance user's result analysis. CONCLUSIONS: Seqcrawler provides a complete open source search and store solution for labs or platforms needing to manage large amount of data/meta-data with a flexible and customizable web interface. All components (search engine, visualization and data storage), though independent, share a common and coherent data system that can be queried with a simple HTTP interface. The solution scales easily and can also provide a high availability infrastructure

HAL-CentraleSupelec

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

Directory of Open Access Journals

PubMed Central

HAL-Rennes 1

Recherche d'instances de motifs expressifs avec Logol. Application à la modélisation d'événements de frameshift -1

Author: Belleannée Catherine
Nicolas Jacques
Sallou Olivier
Publication venue: HAL CCSD
Publication date: 03/07/2012
Field of study

HAL Descartes