Search CORE

38 research outputs found

Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform

Author: Chaudhary Aashish
Cowan Matt
Hanwell Marcus
Jourdain Sebastien
Malitsky Nikolay
O'Leary Patrick
Van Dam Kerstin Kleese
Publication venue
Publication date: 12/05/2018
Field of study

Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three Vs (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spark-MPI approach connecting the Spark data-intensive platform with the MPI high-performance framework. In contrast with existing data management and analytics systems, Spark introduced a new middleware based on resilient distributed datasets (RDDs), which decoupled various data sources from high-level processing algorithms. The RDD middleware significantly advanced the scope of data-intensive applications, spreading from SQL queries to machine learning to graph processing. Spark-MPI further extended the Spark ecosystem with the MPI applications using the Process Management Interface. The paper explores this integrated platform within the context of online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201

arXiv.org e-Print Archive

Crossref

Scipedia

Experimental Data Curation at Large Instrument Facilities with Open Source Software

Author: Campbell Stuart I
Kleese van Dam Kerstin
Pouchard Line
Publication venue: 'Edinburgh University Library'
Publication date: 11/09/2019
Field of study

The National Synchrotron Light Source II operating at Brookhaven National Laboratory since 2014 for the US Department of Energy is one of the newest and brightest storage-ring synchrotron facility in the world.  NSLS-II, like other facilities, provides pre-processing of the raw data and some analysis capabilities to its users. We describe the research collaborations and open source infrastructure  developed at large instrument facilities such as NSLS-II for the purpose of curating high value scientific data along the early stages of the data lifecycle.  Data acquisition and curation tasks include storing experiment configuration, detector metadata, raw data acquisition with infrastructure that converts proprietary instrument formats to industry standards.  In addition, we describe a specific effort for discovering sample information at NSLS-II and tracing the provenance of analysis performed on acquired images.  We show that curation tasks must be embedded into software along the data life cycle for effectiveness and ease of use, and that loosely defined collaborations evolve around shared open source tools.  Finally we discuss best practices for experimental metadata capture in such facilities, data access and the new challenges of scale and complexity posed by AI-based discovery for the synthesis of new materials

International Journal of Digital Curation

Experimental Data Curation at Large Instrument Facilities with Open Source Software

Author: Pouchard Line
Kleese van Dam Kerstin
Campbell Stuart I
Publication venue: University of Edinburgh
Publication date: 20/08/2013
Field of study

Registro Nacional de Trabajos de Investigación y Proyectos

International Journal of Digital Curation

Status Report of the DPHEP Study Group: Towards a Global Effort for Sustainable Data Preservation in High Energy Physics

Author: Akopov Z.
Amerio Silvia
Asner David
Avetisyan Eduard
Barring Olof
Beacham James
Bellis Matthew
Bernardi Gregorio
Bethke Siegfried
Boehnlein Amber
Brooks Travis
Browder Thomas
Brun Rene
Cartaro Concetta
Cattaneo Marco
Chen Gang
Corney David
Cranmer Kyle
Culbertson Ray
Dallmeier-Tiessen Sunje
Denis Rick St.
Denisov Dmitri
Diaconu Cristinel
Diberder Francois Le
Dodonov Vitaliy
Doyle Tony
Dubois-Felsmann Gregory
Ernst Michael
Gasthuber Martin
Geiser Achim
Gianotti Fabiola
Giubellino Paolo
Golutvin Andrey
Gordon John
Guelzow Volker
Hara Takanori
Hayashii Hisaki
Heiss Andreas
Hemmer Frederic
Hernandez Fabio
Heyes Graham
Holzner Andre
Igo-Kemenes Peter
Iijima Toru
Incandela Joe
Jones Roger
Kemp Yves
Knobloch Juergen
Kreincik David
Lassila-Perini Kati
Levonian Sergey
Levy Aharon
Li Qizhong
Lobodzinski Bogdan
Maggi Marcello
Malka Janusz
Mele Salvatore
Mount Richard
Neal Homer
Olsson Jan
Ozerov Dmitri
Piilonen Leo
Punzi Giovanni
Regimbal Kevin
Riley Daniel
Roney Michael
Roser Robert
Ruf Thomas
Sakai Yoshihide
Sasaki Takashi
Schnell Gunar
Schroeder Matthias
Schutz Yves
Shiers Jamie
Smith Tim
Snider Rick
South David M.
Steder Michael
van Dam Kerstin Kleese
Van Wezel Jos
Varnes Erich
von der Schmitt Hans
Votava Margaret
Wang Yifang
Weygand Dennis
White Vicky
Wichmann Katarzyna
Wolbers Stephen
Yamauchi Masanori
Yavin Itay
Publication venue
Publication date: 21/05/2012
Field of study

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. An inter-experimental study group on HEP data preservation and long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The group was formed by large collider-based experiments and investigated the technical and organisational aspects of HEP data preservation. An intermediate report was released in November 2009 addressing the general issues of data preservation in HEP. This paper includes and extends the intermediate report. It provides an analysis of the research case for data preservation and a detailed description of the various projects at experiment, laboratory and international levels. In addition, the paper provides a concrete proposal for an international organisation in charge of the data management and policies in high-energy physics

arXiv.org e-Print Archive

CERN Document Server

Quantum Computing for High Energy Physics workshop

Author: Van Dam Kerstin Kleese
Publication venue
Publication date: 01/01/2018
Field of study

CERN Document Server

Data-intensive science

Author: Critchlow Terence
Kleese van Dam Kerstin
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

Data-intensive science has the potential to transform scientific research and quickly translate scientific progress into complete solutions, policies, and economic success. But this collaborative science is still lacking the effective access and exchange of knowledge among scientists, researchers, and policy makers across a range of disciplines. Bringing together leaders from multiple scientific disciplines, Data-Intensive Science shows how a comprehensive integration of various techniques and technological advances can effectively harness the vast amount of data being generated and significa

CERN Document Server

Enabling Technologies for Improved Data Management: Hardware

Author: Kerstin van Dam-Kleese
Michael Hopewell
Publication venue: Hindawi Limited
Publication date
Field of study

The most valuable assets in every scientific community are the expert work force and the research results/data produced. The last decade has seen new experimental and computational techniques developing at an ever-faster pace, encouraging the production of ever-larger quantities of data in ever-shorter time spans. Concurrently the traditional scientific working environment has changed beyond recognition. Today scientists can use a wide spectrum of experimental, computational and analytical facilities, often widely distributed over the UK and Europe. In this environment new challenges are posed for the Management of Data every day, but are we ready to tackle them? Do we know exactly what the challenges are? Is the right technology available and is it applied where necessary? This part of enabling technologies investigates current hardware techniques and their functionalities and provides a comparison between various products

Directory of Open Access Journals

Environment from the Molecular Level e-Science project and its use of CLRC's Web Services based Data Portal

Author: Blanshard Lisa
Dove Martin
Kleese van Dam Kerstin
Publication venue
Publication date: 01/01/2003
Field of study

ePubs: the open archive for STFC research publications