483,138 research outputs found

    Designing Traceability into Big Data Systems

    Full text link
    Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

    Emerging from the MIST: A Connector Tool for Supporting Programming by Non-programmers

    Get PDF
    Software development is an iterative process. As user re-quirements emerge software applications must be extended to support the new requirements. Typically, a programmer will add new code to an existing code base of an application to provide a new functionality. Previous research has shown that such extensions are easier when application logic is clearly separated from the user interface logic. Assuming that a programmer is already familiar with the existing code base, the task of writing the new code can be considered to be split into two sub-tasks: writing code for the application logic; that is, the actual functionality of the application; and writing code for the user interface that will expose the functionality to the end user. The goal of this research is to reduce the effort required to create a user interface once the application logic has been created, toward supporting scientists with minimal pro-gramming knowledge to be able to create and modify pro-grams. Using a Model View Controller based architecture, various model components which contain the application logic can be built and extended. The process of creating and extending the views (user interfaces) on these model components is simplified through the use of our Malleable Interactive Software Toolkit (MIST), a tool set an infrastructure intended to simplify the design and extension of dynamically reconfigurable interfaces. This paper focuses on one tool in the MIST suite, a connec-tor tool that enables the programmer to evolve the user interface as the application logic evolves by connecting related pieces of code together; either through simple drag-and-drop interactions or through the authoring of Python code. The connector tool exemplifies the types of tools in the MIST suite, which we expect will encourage collabora-tive development of applications by allowing users to inte-grate various components and minimizing the cost of de-veloping new user interfaces for the combined compo-nents

    Extensible Component Based Architecture for FLASH, A Massively Parallel, Multiphysics Simulation Code

    Full text link
    FLASH is a publicly available high performance application code which has evolved into a modular, extensible software system from a collection of unconnected legacy codes. FLASH has been successful because its capabilities have been driven by the needs of scientific applications, without compromising maintainability, performance, and usability. In its newest incarnation, FLASH3 consists of inter-operable modules that can be combined to generate different applications. The FLASH architecture allows arbitrarily many alternative implementations of its components to co-exist and interchange with each other, resulting in greater flexibility. Further, a simple and elegant mechanism exists for customization of code functionality without the need to modify the core implementation of the source. A built-in unit test framework providing verifiability, combined with a rigorous software maintenance process, allow the code to operate simultaneously in the dual mode of production and development. In this paper we describe the FLASH3 architecture, with emphasis on solutions to the more challenging conflicts arising from solver complexity, portable performance requirements, and legacy codes. We also include results from user surveys conducted in 2005 and 2007, which highlight the success of the code.Comment: 33 pages, 7 figures; revised paper submitted to Parallel Computin

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

    Improving the scalability of parallel N-body applications with an event driven constraint based execution model

    Full text link
    The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.Comment: 11 figure

    Supporting the Everyday Work of Scientists: Automating Scientific Workflows

    Get PDF
    This paper describes an action research project that we undertook with National Research Council Canada (NRC) scientists. Based on discussions about their \ud difficulties in using software to collect data and manage processes, we identified three requirements for increasing research productivity: ease of use for end- \ud users; managing scientific workflows; and facilitating software interoperability. Based on these requirements, we developed a software framework, Sweet, to \ud assist in the automation of scientific workflows. \ud \ud Throughout the iterative development process, and through a series of structured interviews, we evaluated how the framework was used in practice, and identified \ud increases in productivity and effectiveness and their causes. While the framework provides resources for writing application wrappers, it was easier to code the applications’ functionality directly into the framework using OSS components. Ease of use for the end-user and flexible and fully parameterized workflow representations were key elements of the framework’s success. \u
    • …
    corecore