6 research outputs found

    Automating Deployment of Several GBrowse Instances

    Get PDF
    Background As part of the fungal endophyte genomes project, we maintain genome browsers for several dozen strains of fungi from the Clavicipitaceae and related families. These genome browsers are based on the GBrowse software, with a large collection of in-house software for visualization, analysis, and searching of genome features. Although GBrowse supports serving multiple data sources, such as distinct genome assemblies, from a single GBrowse instance, there are advantages to maintaining separate instances for each genome. Besides permitting per-genome customizations of the software, page layout, and database schemas, our use of separate instances also allows us to maintain different security and password requirements for genomes in different stages of publication. Materials and methods We have developed a suite of software for deploying and maintaining a large collection of GBrowse instances. This software, a combination of Perl, shell libraries, and scripts, automates the process of deploying the software, databases, and configuration required to make a new customized genome browser available online; and furthermore automates loading each instance’s database with genome sequences, annotations, and other data. To maintain a mostly synchronized codebase while allowing distinct configuration, we record each instance’s software and configuration as a branch in a Subversion version control repository. This use of version control ensures that bug fixes and software improvements are easily applied to each relevant instance, without losing customizations. Results We describe the components of our genome browser instances, the design and implementation of our deployment software, and various challenges and practical considerations we have encountered while using this software to maintain genome browsers for nearly fifty organism strains and assembly versions

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

    Distributed Management of Grid-based Scientific Workflows

    Get PDF
    Grids and service-oriented technologies are emerging as dominant approaches for distributed systems. With the evolution of these technologies, scientific workflows have been introduced as a tool for scientists to assemble highly specialized applications, and to exchange large heterogeneous datasets in order to automate and accelerate the accomplishment of complex scientific tasks. Several Scientific Workflow Management Systems (SWfMS) have already been designed to support the specification, execution, and monitoring of scientific workflows. Meanwhile, they still face key challenges from two different perspectives: system usability and system efficiency. From the system usability perspective, current SWfMS are not designed to be simple enough for scientists who have quite limited IT knowledge. What’s more, there is no easy mechanism by which scientists can share and re-use scientific experiments that have already been designed and proved by others. From the perspective of system efficiency, existing SWfMS are coordinating and executing workflows in a centralized fashion using a single scheduler and / or a workflow enactor. This creates a single point of failure, forms a scalability bottleneck, and enforces centralized fault handling. In addition, they don’t consider load balancing while mapping abstract jobs onto several computational nodes. Another important challenge exists due to the common nature of scientific workflow applications, that need to exchange a huge amount of data during the execution process. Some available SWfMS use a mediator-based approach for data transfer where data must be transferred first to a centralized data manager, which is completely inefficient. Other SWfMS apply a peer-to-peer approach via data references. Even this approach is not sufficient for scientific workflows as a single complex scientific activity can produce an extensive amount of data. In this thesis, we introduce SWIMS (Scientific Workflow Integration and Management System) framework. It employs the Web Services technology to originate a distributed management system for data-intensive scientific workflows. The purpose of SWIMS is to overcome the previously mentioned challenges through a set of salient features: i) Support for distributed execution and management of workflows, ii) diminution of communication traffic, iii) support for smart re-run, iv) distributed fault handling and load balancing, v) ease of use, and vi) extensive sharing of scientific workflows. We discuss the motivation, design, and implementation of the SWIMS framework. Then, we evaluate it through the Montage application from the astronomy domain

    A Heuristic Ontological Model of Protein Complexes A Case Study Based on the E3 Ubiquitin Ligase Protein Complexes of Arabidopsis thaliana

    Get PDF
    Ontology (with a capital O) is the philosophical study of the nature of existence that was derived to define the relationships of entities that can be said to exist in nature. The concept of an ontology was later adopted by the biological sciences to formally represent knowledge within a biological domain in order to standardize the annotation of biological data, and further, enable more efficient and easier data collection, sharing, and reuse across biological and model organism databases. The Protein Ontology (PRO) is a specific biological ontology developed to represent the relationships between proteins and protein complexes. This thesis presents a revised PRO framework, modelled around Arabidopsis thaliana and associated SCF ubiquitin ligase complexes, with the aim to more adequately represent what is known about the process and dynamics of protein complex formation in order to better serve the broader scientific community

    Work flows in life science

    Get PDF
    The introduction of computer science technology in the life science domain has resulted in a new life science discipline called bioinformatics. Bioinformaticians are biologists who know how to apply computer science technology to perform computer based experiments, also known as in-silico or dry lab experiments. Various tools, such as databases, web applications and scripting languages, are used to design and run in-silico experiments. As the size and complexity of these experiments grow, new types of tools are required to design and execute the experiments and to analyse the results. Workflow systems promise to fulfill this role. The bioinformatician composes an experiment by using tools and web services as building blocks, and connecting them, often through a graphical user interface. Workflow systems, such as Taverna, provide access to up to a few thousand resources in a uniform way. Although workflow systems are intended to make the bioinformaticians' work easier, bioinformaticians experience difficulties in using them. This thesis is devoted to find out which problems bioinformaticians experience using workflow systems and to provide solutions for these problems.\u
    corecore