10 research outputs found

    The Capabilities of Chaos and Complexity

    Get PDF
    To what degree could chaos and complexity have organized a Peptide or RNA World of crude yet necessarily integrated protometabolism? How far could such protolife evolve in the absence of a heritable linear digital symbol system that could mutate, instruct, regulate, optimize and maintain metabolic homeostasis? To address these questions, chaos, complexity, self-ordered states, and organization must all be carefully defined and distinguished. In addition their cause-and-effect relationships and mechanisms of action must be delineated. Are there any formal (non physical, abstract, conceptual, algorithmic) components to chaos, complexity, self-ordering and organization, or are they entirely physicodynamic (physical, mass/energy interaction alone)? Chaos and complexity can produce some fascinating self-ordered phenomena. But can spontaneous chaos and complexity steer events and processes toward pragmatic benefit, select function over non function, optimize algorithms, integrate circuits, produce computational halting, organize processes into formal systems, control and regulate existing systems toward greater efficiency? The question is pursued of whether there might be some yet-to-be discovered new law of biology that will elucidate the derivation of prescriptive information and control. “System” will be rigorously defined. Can a low-informational rapid succession of Prigogine’s dissipative structures self-order into bona fide organization

    Modern Systems for Large-scale Genomics Data Analysis in the Cloud

    Get PDF
    Genomics researchers increasingly turn to cloud computing as a means of accomplishing large-scale analyses efficiently and cost-effectively. Successful operation in the cloud requires careful instrumentation and management to avoid common pitfalls, such as resource bottlenecks and low utilisation that can both drive up costs and extend the timeline of a scientific project. We developed the Butler framework for large-scale scientific workflow management in the cloud to meet these challenges. The cornerstones of Butler design are: ability to support multiple clouds, declarative infrastructure configuration management, scalable, fault-tolerant operation, comprehensive resource monitoring, and automated error detection and recovery. Butler relies on industry-strength open-source components in order to deliver a framework that is robust and scalable to thousands of compute cores and millions of workflow executions. Butler’s error detection and self-healing capabilities are unique among scientific workflow frameworks and ensure that analyses are carried out with minimal human intervention. Butler has been used to analyse over 725TB of DNA sequencing data on the cloud, using 1500 CPU cores, and 6TB of RAM, delivering results with 43\% increased efficiency compared to other tools. The flexible design of this framework allows easy adoption within other fields of Life Sciences and ensures that it will scale together with the demand for scientific analysis in the cloud for years to come. Because many bioinformatics tools have been developed in the context of small sample sizes they often struggle to keep up with the demands for large-scale data processing required for modern research and clinical sequencing projects due to the limitations in their design. The Rheos software system is designed specifically with these large data sets in mind. Utilising the elastic compute capacity of modern academic and commercial clouds, Rheos takes a service-oriented containerised approach to the implementation of modern bioinformatics algorithms, which allows the software to achieve the scalability and ease-of-use required to succeed under increased operational load of massive data sets generated by projects like International Cancer Genomics Consortium (ICGC) Argo and the All of Us initiative. Rheos algorithms are based on an innovative stream-based approach for processing genomic data, which enables Rheos to make faster decisions about the presence of genomic mutations that drive diseases such as cancer, thereby improving the tools' efficacy and relevance to clinical sequencing applications. Our testing of the novel germline Single Nucleotide Polymorphism (SNP) and deletion variant calling algorithms developed within Rheos indicates that Rheos achieves ~98\% accuracy in SNP calling and ~85\% accuracy in deletion calling, which is comparable with other leading tools such as the Genome Analysis Toolkit (GATK), freebayes, and Delly. The two frameworks that we developed provide important contributions to solve the ever-growing need for large scale genomic data analysis on the cloud, by enabling more effective use of existing tools, in the case of Butler, and providing a new, more dynamic and real-time approach to genomic analysis, in the case of Rheos
    corecore