765 research outputs found

    Master of Science

    Get PDF
    thesisCloud infrastructures have massively increased access to latent compute resources al- lowing for computations that were previously out of reach to be performed efficiently and cheaply. Due to the multi-user nature of clouds, this wealth of resources has been "siloed" into discrete isolated segments to ensure privacy and control over the resources by their current owner. Modern clouds have evolved beyond basic resource sharing, and have become platforms of modern development. Clouds are now home to rich ecosystems of services provided by third parties, or the cloud itself. However, clouds employ a rigid access control model that limits how cloud users can access these third-party services. With XNet, we aim to make cloud access control systems more flexible and dynamic by model- ing cloud access control as an object-based capability system. In this model, cloud users create and exchange "capabilities" to resources that permit them to use those resources as long as they continue to possess a capability to them. This model has collaborative policy definition at its core, allowing cloud users to more safely provide services to other users, and use services provided to them. We have implemented our model, and have integrated it into the popular OpenStack cloud system. Further, we have modified the existing Galaxy scientific workflow system to support our model, greatly enhancing the security guaranteed to users of the Galaxy system

    Distributed computing and data storage in proteomics: many hands make light work, and a stronger memory

    Get PDF
    Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational demands, distributed architectures have gained substantial popularity in the recent years. In this review, we provide an overview of the current techniques for distributed computing, along with examples of how the techniques are currently being employed in the field of proteomics. We thus underline the benefits of distributed computing in proteomics, while also pointing out the potential issues and pitfalls involved.acceptedVersio

    The Hierarchic treatment of marine ecological information from spatial networks of benthic platforms

    Get PDF
    Measuring biodiversity simultaneously in different locations, at different temporal scales, and over wide spatial scales is of strategic importance for the improvement of our understanding of the functioning of marine ecosystems and for the conservation of their biodiversity. Monitoring networks of cabled observatories, along with other docked autonomous systems (e.g., Remotely Operated Vehicles [ROVs], Autonomous Underwater Vehicles [AUVs], and crawlers), are being conceived and established at a spatial scale capable of tracking energy fluxes across benthic and pelagic compartments, as well as across geographic ecotones. At the same time, optoacoustic imaging is sustaining an unprecedented expansion in marine ecological monitoring, enabling the acquisition of new biological and environmental data at an appropriate spatiotemporal scale. At this stage, one of the main problems for an effective application of these technologies is the processing, storage, and treatment of the acquired complex ecological information. Here, we provide a conceptual overview on the technological developments in the multiparametric generation, storage, and automated hierarchic treatment of biological and environmental information required to capture the spatiotemporal complexity of a marine ecosystem. In doing so, we present a pipeline of ecological data acquisition and processing in different steps and prone to automation. We also give an example of population biomass, community richness and biodiversity data computation (as indicators for ecosystem functionality) with an Internet Operated Vehicle (a mobile crawler). Finally, we discuss the software requirements for that automated data processing at the level of cyber-infrastructures with sensor calibration and control, data banking, and ingestion into large data portals.Peer ReviewedPostprint (published version

    Bioinspired Computing: Swarm Intelligence

    Get PDF

    A Chemistry-Inspired Workflow Management System for a Decentralized Composite Service Execution

    Get PDF
    With the recent widespread adoption of service-oriented architecture, the dynamic composition of such services is now a crucial issue in the area of distributed computing. The coordination and execution of composite Web services are today typically conducted by heavyweight centralized workflow engines, leading to an increasing probability of processing and communication bottleneck and failures. In addition, centralization induces higher deployment costs, such as the computing infrastructure to support the workflow engine, which is not affordable for a large number of small businesses and end-users. Last but not least, central workflow engines leads to diverse inadequate consequences dealing with privacy or energy consumption. In a world where platforms are more and more dynamic and elastic as promised by cloud computing, decentralized and dynamic interaction schemes are required. Addressing the characteristics of such platforms, nature-inspired analogies recently regained attention to provide autonomous service coordination on top of dynamic large scale platforms. In this report, we propose a decentralized approach for the execution of composite Web services based on an unconventional programming paradigm that relies on the chemical metaphor. It provides a high-level execution model that allows executing composite services in a fully decentralized manner. Composed of services communicating through a persistent shared space containing control and data flows between services, our architecture allows to distribute the composition among nodes without the need for any centralized coordination. A proof of concept is given, through the deployment of a software prototype implementing these concepts, showing the viability of an autonomic vision of service composition.Suite à l'adoption grandissante des architectures orientées service, la composition dynamique de services est devenu un problème important de la construction de plates-formes de calcul distribué. La coordination et l'exécutiuon de Web Service composites sont aujourd'hui typiquement conduits par des moteurs de "workflows" (graphes de composition de services, formant un "service composite") centralisés, entrainant différents problèmes, et notamment une probabilité grandissante d'apparition d'échecs ou de goulots d'étranglement. Dans un monde où les plate-formes sont de plus en plus dynamiques (ou "élastiques", comme envisagé par les "clouds", de nouveaux mécanismes de coordination dynamiques sont requis. Dans ce contexte, des métaphores naturelles ont gagné une attention particulière récemment, car elles fournissent des abstractions pour la coordination autonome d'entités (commes les services.) Dans ce rapport, une approche décentralisée pour l'exécution de Web Services composites fondée sur la métaphore chimique, qui fournit un modèle d'exécution haut-niveau pour l'exécution décentralisée, est présentée. Dans cette architecture, les services communiquent à travers un espace virtuellement partagé persistant contenant l'information sur les flux de contrôle et de données, permettant une coordination décentralisée des services. Un prototype logiciel a été développé et expérimenté. Les résultats de ces expériences sont présentés à la fin de ce rapport

    Efficient Implementation of Stochastic Inference on Heterogeneous Clusters and Spiking Neural Networks

    Get PDF
    Neuromorphic computing refers to brain inspired algorithms and architectures. This paradigm of computing can solve complex problems which were not possible with traditional computing methods. This is because such implementations learn to identify the required features and classify them based on its training, akin to how brains function. This task involves performing computation on large quantities of data. With this inspiration, a comprehensive multi-pronged approach is employed to study and efficiently implement neuromorphic inference model using heterogeneous clusters to address the problem using traditional Von Neumann architectures and by developing spiking neural networks (SNN) for native and ultra-low power implementation. In this regard, an extendable high-performance computing (HPC) framework and optimizations are proposed for heterogeneous clusters to modularize complex neuromorphic applications in a distributed manner. To achieve best possible throughput and load balancing for such modularized architectures a set of algorithms are proposed to suggest the optimal mapping of different modules as an asynchronous pipeline to the available cluster resources while considering the complex data dependencies between stages. On the other hand, SNNs are more biologically plausible and can achieve ultra-low power implementation due to its sparse spike based communication, which is possible with emerging non-Von Neumann computing platforms. As a significant progress in this direction, spiking neuron models capable of distributed online learning are proposed. A high performance SNN simulator (SpNSim) is developed for simulation of large scale mixed neuron model networks. An accompanying digital hardware neuron RTL is also proposed for efficient real time implementation of SNNs capable of online learning. Finally, a methodology for mapping probabilistic graphical model to off-the-shelf neurosynaptic processor (IBM TrueNorth) as a stochastic SNN is presented with ultra-low power consumption

    Scientific Workflows for Metabolic Flux Analysis

    Get PDF
    Metabolic engineering is a highly interdisciplinary research domain that interfaces biology, mathematics, computer science, and engineering. Metabolic flux analysis with carbon tracer experiments (13 C-MFA) is a particularly challenging metabolic engineering application that consists of several tightly interwoven building blocks such as modeling, simulation, and experimental design. While several general-purpose workflow solutions have emerged in recent years to support the realization of complex scientific applications, the transferability of these approaches are only partially applicable to 13C-MFA workflows. While problems in other research fields (e.g., bioinformatics) are primarily centered around scientific data processing, 13C-MFA workflows have more in common with business workflows. For instance, many bioinformatics workflows are designed to identify, compare, and annotate genomic sequences by "pipelining" them through standard tools like BLAST. Typically, the next workflow task in the pipeline can be automatically determined by the outcome of the previous step. Five computational challenges have been identified in the endeavor of conducting 13 C-MFA studies: organization of heterogeneous data, standardization of processes and the unification of tools and data, interactive workflow steering, distributed computing, and service orientation. The outcome of this thesis is a scientific workflow framework (SWF) that is custom-tailored for the specific requirements of 13 C-MFA applications. The proposed approach – namely, designing the SWF as a collection of loosely-coupled modules that are glued together with web services – alleviates the realization of 13C-MFA workflows by offering several features. By design, existing tools are integrated into the SWF using web service interfaces and foreign programming language bindings (e.g., Java or Python). Although the attributes "easy-to-use" and "general-purpose" are rarely associated with distributed computing software, the presented use cases show that the proposed Hadoop MapReduce framework eases the deployment of computationally demanding simulations on cloud and cluster computing resources. An important building block for allowing interactive researcher-driven workflows is the ability to track all data that is needed to understand and reproduce a workflow. The standardization of 13 C-MFA studies using a folder structure template and the corresponding services and web interfaces improves the exchange of information for a group of researchers. Finally, several auxiliary tools are developed in the course of this work to complement the SWF modules, i.e., ranging from simple helper scripts to visualization or data conversion programs. This solution distinguishes itself from other scientific workflow approaches by offering a system of loosely-coupled components that are flexibly arranged to match the typical requirements in the metabolic engineering domain. Being a modern and service-oriented software framework, new applications are easily composed by reusing existing components

    Data Models in Neuroinformatics

    Get PDF
    Advancements in integrated neuroscience are often characterized with data-driven approaches for discovery; these progressions are the result of continuous efforts aimed at developing integrated frameworks for the investigation of neuronal dynamics at increasing resolution and in varying scales. Since insights from integrated neuronal models frequently rely on both experimental and computational approaches, simulations and data modeling have inimitable roles. Moreover, data sharing across the neuroscientific community has become an essential component of data-driven approaches to neuroscience as is evident from the number and scale of ongoing national and multinational projects, engaging scientists from diverse branches of knowledge. In this heterogeneous environment, the need to share neuroscientific data as well as to utilize it across different simulation environments drove the momentum for standardizing data models for neuronal morphologies, biophysical properties, and connectivity schemes. Here, I review existing data models in neuroinformatics, ranging from flat to hybrid object-hierarchical approaches, and suggest a framework with which these models can be linked to experimental data, as well as to established records from existing databases. Linking neuronal models and experimental results with data on relevant articles, genes, proteins, disease, etc., might open a new dimension for data-driven neuroscience

    Enhancing Data Processing on Clouds with Hadoop/HBase

    Get PDF
    In the current information age, large amounts of data are being generated and accumulated rapidly in various industrial and scientific domains. This imposes important demands on data processing capabilities that can extract sensible and valuable information from the large amount of data in a timely manner. Hadoop, the open source implementation of Google's data processing framework (MapReduce, Google File System and BigTable), is becoming increasingly popular and being used to solve data processing problems in various application scenarios. However, being originally designed for handling very large data sets that can be divided easily in parts to be processed independently with limited inter-task communication, Hadoop lacks applicability to a wider usage case. As a result, many projects are under way to enhance Hadoop for different application needs, such as data warehouse applications, machine learning and data mining applications, etc. This thesis is one such research effort in this direction. The goal of the thesis research is to design novel tools and techniques to extend and enhance the large-scale data processing capability of Hadoop/HBase on clouds, and to evaluate their effectiveness in performance tests on prototype implementations. Two main research contributions are described. The first contribution is a light-weight computational workflow system called "CloudWF" for Hadoop. The second contribution is a client library called "HBaseSI" supporting transactional snapshot isolation (SI) in HBase, Hadoop's database component. CloudWF addresses the problem of automating the execution of scientific workflows composed of both MapReduce and legacy applications on clouds with Hadoop/HBase. CloudWF is the first computational workflow system built directly using Hadoop/HBase. It uses novel methods in handling workflow directed acyclic graph decomposition, storing and querying dependencies in HBase sparse tables, transparent file staging, and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance. HBaseSI addresses the problem of maintaining strong transactional data consistency in HBase tables. This is the first SI mechanism developed for HBase. HBaseSI uses novel methods in handling distributed transactional management autonomously by individual clients. These methods greatly simplify the design of HBaseSI and can be generalized to other column-oriented stores with similar architecture as HBase. As a result of the simplicity in design, HBaseSI adds low overhead to HBase performance and directly inherits many desirable properties of HBase. HBaseSI is non-intrusive to existing HBase installations and user data, and is designed to work with a large cloud in terms of data size and the number of nodes in the cloud
    • …
    corecore