483,138 research outputs found
Designing Traceability into Big Data Systems
Providing an appropriate level of accessibility and traceability to data or
process elements (so-called Items) in large volumes of data, often
Cloud-resident, is an essential requirement in the Big Data era.
Enterprise-wide data systems need to be designed from the outset to support
usage of such Items across the spectrum of business use rather than from any
specific application view. The design philosophy advocated in this paper is to
drive the design process using a so-called description-driven approach which
enriches models with meta-data and description and focuses the design process
on Item re-use, thereby promoting traceability. Details are given of the
description-driven design of big data systems at CERN, in health informatics
and in business process management. Evidence is presented that the approach
leads to design simplicity and consequent ease of management thanks to loose
typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International
Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore
July 2015. arXiv admin note: text overlap with arXiv:1402.5764,
arXiv:1402.575
Emerging from the MIST: A Connector Tool for Supporting Programming by Non-programmers
Software development is an iterative process. As user re-quirements emerge software applications must be extended to support the new requirements. Typically, a programmer will add new code to an existing code base of an application to provide a new functionality. Previous research has shown that such extensions are easier when application logic is clearly separated from the user interface logic. Assuming that a programmer is already familiar with the existing code base, the task of writing the new code can be considered to be split into two sub-tasks: writing code for the application logic; that is, the actual functionality of the application; and writing code for the user interface that will expose the functionality to the end user.
The goal of this research is to reduce the effort required to create a user interface once the application logic has been created, toward supporting scientists with minimal pro-gramming knowledge to be able to create and modify pro-grams. Using a Model View Controller based architecture, various model components which contain the application logic can be built and extended. The process of creating and extending the views (user interfaces) on these model components is simplified through the use of our Malleable Interactive Software Toolkit (MIST), a tool set an infrastructure intended to simplify the design and extension of dynamically reconfigurable interfaces.
This paper focuses on one tool in the MIST suite, a connec-tor tool that enables the programmer to evolve the user interface as the application logic evolves by connecting related pieces of code together; either through simple drag-and-drop interactions or through the authoring of Python code. The connector tool exemplifies the types of tools in the MIST suite, which we expect will encourage collabora-tive development of applications by allowing users to inte-grate various components and minimizing the cost of de-veloping new user interfaces for the combined compo-nents
Extensible Component Based Architecture for FLASH, A Massively Parallel, Multiphysics Simulation Code
FLASH is a publicly available high performance application code which has
evolved into a modular, extensible software system from a collection of
unconnected legacy codes. FLASH has been successful because its capabilities
have been driven by the needs of scientific applications, without compromising
maintainability, performance, and usability. In its newest incarnation, FLASH3
consists of inter-operable modules that can be combined to generate different
applications. The FLASH architecture allows arbitrarily many alternative
implementations of its components to co-exist and interchange with each other,
resulting in greater flexibility. Further, a simple and elegant mechanism
exists for customization of code functionality without the need to modify the
core implementation of the source. A built-in unit test framework providing
verifiability, combined with a rigorous software maintenance process, allow the
code to operate simultaneously in the dual mode of production and development.
In this paper we describe the FLASH3 architecture, with emphasis on solutions
to the more challenging conflicts arising from solver complexity, portable
performance requirements, and legacy codes. We also include results from user
surveys conducted in 2005 and 2007, which highlight the success of the code.Comment: 33 pages, 7 figures; revised paper submitted to Parallel Computin
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
Supporting the Everyday Work of Scientists: Automating Scientific Workflows
This paper describes an action research project that we undertook with National Research Council Canada (NRC) scientists. Based on discussions about their \ud
difficulties in using software to collect data and manage processes, we identified three requirements for increasing research productivity: ease of use for end- \ud
users; managing scientific workflows; and facilitating software interoperability. Based on these requirements, we developed a software framework, Sweet, to \ud
assist in the automation of scientific workflows. \ud
\ud
Throughout the iterative development process, and through a series of structured interviews, we evaluated how the framework was used in practice, and identified \ud
increases in productivity and effectiveness and their causes. While the framework provides resources for writing application wrappers, it was easier to code the applications’ functionality directly into the framework using OSS components. Ease of use for the end-user and flexible and fully parameterized workflow representations were key elements of the framework’s success. \u
- …