2 research outputs found

    dispel4py: A Python framework for data-intensive scientific computing

    Get PDF
    This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.</p

    AVENTIS - An architecture for event data analysis

    Full text link
    Time-stamped event data is being generated at an exponential rate from various sources (sensor networks, e-markets etc.), which are stored in event logs and made available to researchers. Despite the data deluge and evolution of a plethora of tools and technologies, science behind exploratory analysis and knowledge discovery lags. There are several reasons behind this. In conducting event data analysis, researchers typically detect a pattern or trend in the data through computation of time-series measures and apply the computed measures to several mathematical models to glean information from data. This is a complex and time-consuming process covering a range of activities from data capture (from a broad array of data sources) to interpretation and dissemination of experimental results forming a pipeline of activities. Further, data-analysis is conducted by domain-users, who are typically non-IT experts but data processing tools and applications are largely developed by application developers. End-users not only lack the critical skills to build a structured analysis pipeline, but are also perplexed by the number of different ways available to derive the necessary information. Consequently, this thesis proposes AVENTIS (Architecture for eVENT Data analysIS), a novel framework to guide the design of analytic solutions to facilitate time-series analysis of event data and is tailored to the needs of domain users. The framework comprises three components; a knowledge base, a model-driven analytic methodology and an accompanying software architecture that provides the necessary technical and operational requirements. Specifically, the research contribution lies in the ability of the framework to enable expressing analysis requirements at a level of abstraction consistent with the domain users and readily make available the information sought without the users having to build the analysis process themselves. Secondly, the framework also facilitates an abstract design space for the domain experts to enable them to build conceptual models of their experiment as a sequence of structured tasks in a technology neutral manner and transparently translate these abstract process models to executable implementations. To evaluate the AVENTIS framework, a prototype based on AVENTIS is implemented and tested with case studies taken from the financial research domain
    corecore