Search CORE

15,997 research outputs found

Enabling quantitative data analysis through e-infrastructures

Author: Bardsley N.
Birkin M.
Bosveld K.
Ermisch J.
Foster I.
Gregory A.
Grose D.
Guy Warner
Jesse Blum
Ken J. Turner
Kohler U.
Koon Leai Larry Tan
Lambert P.S.
National Science Foundation.
Paul S. Lambert
Procter R.
Research Councils UK e-Science Programme.
Richard O. Sinnott
Rose D.
Schneider S. L.
Simon B. Jones
Sinnott R.O.
Sinnott R.O.
Tan K.L.L.
Turi D.
UK Data Forum.
Vernon Gayle
Publication venue: 'SAGE Publications'
Publication date: 01/01/2009
Field of study

This paper discusses how quantitative data analysis in the social sciences can engage with and exploit an e-Infrastructure. We highlight how a number of activities which are central to quantitative data analysis, referred to as ‘data management’, can benefit from e-infrastructure support. We conclude by discussing how these issues are relevant to the DAMES (Data Management through e-Social Science) research Node, an ongoing project that aims to develop e-Infrastructural resources for quantitative data analysis in the social sciences

Crossref

Stirling Online Research Repository (RIOXX)

Edinburgh Research Explorer

Enlighten

Stirling Online Research Repository

University of Melbourne Institutional Repository

Knowledge-Intensive Processes: Characteristics, Requirements and Analysis of Contemporary Approaches

Author: Di Ciccio Claudio
Marrella Andrea
Russo Alessandro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Engineering of knowledge-intensive processes (KiPs) is far from being mastered, since they are genuinely knowledge- and data-centric, and require substantial flexibility, at both design- and run-time. In this work, starting from a scientific literature analysis in the area of KiPs and from three real-world domains and application scenarios, we provide a precise characterization of KiPs. Furthermore, we devise some general requirements related to KiPs management and execution. Such requirements contribute to the definition of an evaluation framework to assess current system support for KiPs. To this end, we present a critical analysis on a number of existing process-oriented approaches by discussing their efficacy against the requirements

Archivio della ricerca- Università di Roma La Sapienza

BOSS-LDG: A Novel Computational Framework that Brings Together Blue Waters, Open Science Grid, Shifter and the LIGO Data Grid to Accelerate Gravitational Wave Discovery

Author: Anderson Stuart
Bouvet Timothy
Couvares Peter
Enos Jeremy
Fajardo Edgar
Haas Roland
Huerta E. A.
Katz Daniel S.
Kramer William T. C.
Leong Hon Wai
Wheeler David
Willis Josh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/09/2017
Field of study

We present a novel computational framework that connects Blue Waters, the NSF-supported, leadership-class supercomputer operated by NCSA, to the Laser Interferometer Gravitational-Wave Observatory (LIGO) Data Grid via Open Science Grid technology. To enable this computational infrastructure, we configured, for the first time, a LIGO Data Grid Tier-1 Center that can submit heterogeneous LIGO workflows using Open Science Grid facilities. In order to enable a seamless connection between the LIGO Data Grid and Blue Waters via Open Science Grid, we utilize Shifter to containerize LIGO's workflow software. This work represents the first time Open Science Grid, Shifter, and Blue Waters are unified to tackle a scientific problem and, in particular, it is the first time a framework of this nature is used in the context of large scale gravitational wave data analysis. This new framework has been used in the last several weeks of LIGO's second discovery campaign to run the most computationally demanding gravitational wave search workflows on Blue Waters, and accelerate discovery in the emergent field of gravitational wave astrophysics. We discuss the implications of this novel framework for a wider ecosystem of Higher Performance Computing users.Comment: 10 pages, 10 figures. Accepted as a Full Research Paper to the 13th IEEE International Conference on eScienc

arXiv.org e-Print Archive

Crossref

Caltech Authors

An Assessment of Data Transfer Performance for Large-Scale Climate Data Analysis and Recommendations for the Data Infrastructure for CMIP6

Author: Dart Eli
Prabhat
Wehner Michael F.
Publication venue
Publication date: 25/08/2017
Field of study

We document the data transfer workflow, data transfer performance, and other aspects of staging approximately 56 terabytes of climate model output data from the distributed Coupled Model Intercomparison Project (CMIP5) archive to the National Energy Research Supercomputing Center (NERSC) at the Lawrence Berkeley National Laboratory required for tracking and characterizing extratropical storms, a phenomena of importance in the mid-latitudes. We present this analysis to illustrate the current challenges in assembling multi-model data sets at major computing facilities for large-scale studies of CMIP5 data. Because of the larger archive size of the upcoming CMIP6 phase of model intercomparison, we expect such data transfers to become of increasing importance, and perhaps of routine necessity. We find that data transfer rates using the ESGF are often slower than what is typically available to US residences and that there is significant room for improvement in the data transfer capabilities of the ESGF portal and data centers both in terms of workflow mechanics and in data transfer performance. We believe performance improvements of at least an order of magnitude are within technical reach using current best practices, as illustrated by the performance we achieved in transferring the complete raw data set between two high performance computing facilities. To achieve these performance improvements, we recommend: that current best practices (such as the Science DMZ model) be applied to the data servers and networks at ESGF data centers; that sufficient financial and human resources be devoted at the ESGF data centers for systems and network engineering tasks to support high performance data movement; and that performance metrics for data transfer between ESGF data centers and major computing facilities used for climate data analysis be established, regularly tested, and published

arXiv.org e-Print Archive

eScholarship - University of California

Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline.

Author: Chakrapani Shruthi
Dinov Ivo
Eggert Paul
Gutman Boris
Leung Kelvin
Liu Zhizhong
Lozev Kamen
Magsipoc Rico
Parker D Stott
Petrosyan Petros
Pierce Jonathan
Toga Arthur
Van Horn John
Woods Roger
Zamanyan Alen
Publication venue: eScholarship, University of California
Publication date: 01/01/2010
Field of study

Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges--management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Recommended from our members

2100 AI: Reflections on the mechanisation of scientific discovery

Author: Mannocci Andrea
Motta Enrico
Osborne Francesco
Salatino Angelo A.
Publication venue
Publication date: 01/01/2017
Field of study

The pace of research is nowadays extremely intensive, with datasets and publications being published at an unprecedented rate. In this context data science, artificial intelligence, machine learning and big data analytics are providing researchers with new automatic techniques which not only help them to manage this flow of information but are also able to identify automatically interesting patterns and insights in this vast sea of information. However, the emergence of mechanised scientific discovery is likely to dramatically change the way we do science, thus introducing and amplifying serious societal implications on the role of researchers themselves, which need to be analysed thoroughly

Open Research Online