6,293 research outputs found

    S-Store: Streaming Meets Transaction Processing

    Get PDF
    Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and streaming applications. We present a simple transaction model for streams that integrates seamlessly with a traditional OLTP system. We chose to build S-Store as an extension of H-Store, an open-source, in-memory, distributed OLTP database system. By implementing S-Store in this way, we can make use of the transaction processing facilities that H-Store already supports, and we can concentrate on the additional implementation features that are needed to support streaming. Similar implementations could be done using other main-memory OLTP platforms. We show that we can actually achieve higher throughput for streaming workloads in S-Store than an equivalent deployment in H-Store alone. We also show how this can be achieved within H-Store with the addition of a modest amount of new functionality. Furthermore, we compare S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm, and show how S-Store matches and sometimes exceeds their performance while providing stronger transactional guarantees

    Distributed data mining in grid computing environments

    Get PDF
    The official published version of this article can be found at the link below.The computing-intensive data mining for inherently Internet-wide distributed data, referred to as Distributed Data Mining (DDM), calls for the support of a powerful Grid with an effective scheduling framework. DDM often shares the computing paradigm of local processing and global synthesizing. It involves every phase of Data Mining (DM) processes, which makes the workflow of DDM very complex and can be modelled only by a Directed Acyclic Graph (DAG) with multiple data entries. Motivated by the need for a practical solution of the Grid scheduling problem for the DDM workflow, this paper proposes a novel two-phase scheduling framework, including External Scheduling and Internal Scheduling, on a two-level Grid architecture (InterGrid, IntraGrid). Currently a DM IntraGrid, named DMGCE (Data Mining Grid Computing Environment), has been developed with a dynamic scheduling framework for competitive DAGs in a heterogeneous computing environment. This system is implemented in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems from oil well logging analysis are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper

    CYCLONE Unified Deployment and Management of Federated, Multi-Cloud Applications

    Full text link
    Various Cloud layers have to work in concert in order to manage and deploy complex multi-cloud applications, executing sophisticated workflows for Cloud resource deployment, activation, adjustment, interaction, and monitoring. While there are ample solutions for managing individual Cloud aspects (e.g. network controllers, deployment tools, and application security software), there are no well-integrated suites for managing an entire multi cloud environment with multiple providers and deployment models. This paper presents the CYCLONE architecture that integrates a number of existing solutions to create an open, unified, holistic Cloud management platform for multi-cloud applications, tailored to the needs of research organizations and SMEs. It discusses major challenges in providing a network and security infrastructure for the Intercloud and concludes with the demonstration how the architecture is implemented in a real life bioinformatics use case

    Designing Traceability into Big Data Systems

    Full text link
    Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

    Personalizing Situated Workflows for Pervasive Healthcare Applications

    Get PDF
    In this paper, we present an approach where a workflow system is combined with a policy-based framework for the specification and enforcement of policies for healthcare applications. In our approach, workflows are used to capture entitiespsila responsibilities and to assist entities in fulfilling them. The policy-based framework allows us to express authorisation policies to define the rights that entities have in the system, and event-condition-action (ECA) policies that are used to adapt the system to the actual situation. Authorisations will often depend on the context in which patientspsila care takes place, and our policies support predicates that reflect the environment. ECA policies capture events that reflect the current state of the environment and can perform actions to accordingly adapt the workflow execution. We show how the approach can be used for the Edema treatment and how fine-grained authorisation and ECA policies are expressed and used
    • 

    corecore