Search CORE

483,138 research outputs found

Designing Traceability into Big Data Systems

Author: Branson Andrew
Consortium the CRISTAL-ISE
Kovacs Zsolt
McClatchey Richard
Shamdasani Jetendr
Publication venue
Publication date: 01/01/2015
Field of study

Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

arXiv.org e-Print Archive

CERN Document Server

Emerging from the MIST: A Connector Tool for Supporting Programming by Non-programmers

Author: Bhatia Saurabh
Kienzle Paul
Lilley Tripp
McCrickard D. Scott
Publication venue
Publication date: 01/04/2010
Field of study

Software development is an iterative process. As user re-quirements emerge software applications must be extended to support the new requirements. Typically, a programmer will add new code to an existing code base of an application to provide a new functionality. Previous research has shown that such extensions are easier when application logic is clearly separated from the user interface logic. Assuming that a programmer is already familiar with the existing code base, the task of writing the new code can be considered to be split into two sub-tasks: writing code for the application logic; that is, the actual functionality of the application; and writing code for the user interface that will expose the functionality to the end user. The goal of this research is to reduce the effort required to create a user interface once the application logic has been created, toward supporting scientists with minimal pro-gramming knowledge to be able to create and modify pro-grams. Using a Model View Controller based architecture, various model components which contain the application logic can be built and extended. The process of creating and extending the views (user interfaces) on these model components is simplified through the use of our Malleable Interactive Software Toolkit (MIST), a tool set an infrastructure intended to simplify the design and extension of dynamically reconfigurable interfaces. This paper focuses on one tool in the MIST suite, a connec-tor tool that enables the programmer to evolve the user interface as the application logic evolves by connecting related pieces of code together; either through simple drag-and-drop interactions or through the authoring of Python code. The connector tool exemplifies the types of tools in the MIST suite, which we expect will encourage collabora-tive development of applications by allowing users to inte-grate various components and minimizing the cost of de-veloping new user interfaces for the combined compo-nents

Computer Science Technical Reports @Virginia Tech

Extensible Component Based Architecture for FLASH, A Massively Parallel, Multiphysics Simulation Code

Author: Andrew Siegel
Anshu Dubey
Antypas
Armstrong
Calder
Dan Sheeler
Dubey
Fisher
Fryxell
Gardiner
Hornung
Hornung
Katherine Riley
Katie Antypas
Klaus Weide
Lynn B. Reid
Murali K. Ganapathy
Oldham
O’Shea
Reynders
Toth
Publication venue: 'Elsevier BV'
Publication date: 24/07/2009
Field of study

FLASH is a publicly available high performance application code which has evolved into a modular, extensible software system from a collection of unconnected legacy codes. FLASH has been successful because its capabilities have been driven by the needs of scientific applications, without compromising maintainability, performance, and usability. In its newest incarnation, FLASH3 consists of inter-operable modules that can be combined to generate different applications. The FLASH architecture allows arbitrarily many alternative implementations of its components to co-exist and interchange with each other, resulting in greater flexibility. Further, a simple and elegant mechanism exists for customization of code functionality without the need to modify the core implementation of the source. A built-in unit test framework providing verifiability, combined with a rigorous software maintenance process, allow the code to operate simultaneously in the dual mode of production and development. In this paper we describe the FLASH3 architecture, with emphasis on solutions to the more challenging conflicts arising from solver complexity, portable performance requirements, and legacy codes. We also include results from user surveys conducted in 2005 and 2007, which highlight the success of the code.Comment: 33 pages, 7 figures; revised paper submitted to Parallel Computin

arXiv.org e-Print Archive

Crossref

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

EGI user forum 2011 : book of abstracts

Author
Publication venue
Publication date: 01/01/2011
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Improving the scalability of parallel N-body applications with an event driven constraint based execution model

Author: Aarseth SJ
Alfieri RA
Bonachea D
Chandra R
Dekate C
El-Ghazawi T
Hewitt C
Kale L
Message Passing Interface Forum
O’Shea BW
Salmon JK
Singh JP
Publication venue: 'SAGE Publications'
Publication date: 23/09/2011
Field of study

The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.Comment: 11 figure

arXiv.org e-Print Archive

Crossref

Supporting the Everyday Work of Scientists: Automating Scientific Workflows

Author: Mews Keith
Singer Janice A.
Stewart Darlene
Vidger Mark
Vinson Norman G.
Publication venue
Publication date: 24/06/2008
Field of study

This paper describes an action research project that we undertook with National Research Council Canada (NRC) scientists. Based on discussions about their \ud difficulties in using software to collect data and manage processes, we identified three requirements for increasing research productivity: ease of use for end- \ud users; managing scientific workflows; and facilitating software interoperability. Based on these requirements, we developed a software framework, Sweet, to \ud assist in the automation of scientific workflows. \ud \ud Throughout the iterative development process, and through a series of structured interviews, we evaluated how the framework was used in practice, and identified \ud increases in productivity and effectiveness and their causes. While the framework provides resources for writing application wrappers, it was easier to code the applications’ functionality directly into the framework using OSS components. Ease of use for the end-user and flexible and fully parameterized workflow representations were key elements of the framework’s success. \u

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive