Search CORE

56,672 research outputs found

Recommended from our members

Skills and Knowledge for Data-Intensive Environmental Research.

Author: Aukema Juliann
Boettiger Carl
Brun Julien
Budden Amber
Collins Scott
Fernández Denny
Gross Louis
Hampton Stephanie
Hernandez Rebecca
Jones Matthew
Labou Stephanie
Schildhauer Mark
Supp Sarah
Teal Tracy
Wasser Leah
White Ethan
Publication venue: eScholarship, University of California
Publication date: 01/06/2017
Field of study

The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap

eScholarship - University of California

CERN openlab Whitepaper on Future IT Challenges in Scientific Research

Author: Di Meglio Alberto
Gaillard Melissa
Purcell Andrew
Publication venue
Publication date: 01/01/2014
Field of study

This whitepaper describes the major IT challenges in scientific research at CERN and several other European and international research laboratories and projects. Each challenge is exemplified through a set of concrete use cases drawn from the requirements of large-scale scientific programs. The paper is based on contributions from many researchers and IT experts of the participating laboratories and also input from the existing CERN openlab industrial sponsors. The views expressed in this document are those of the individual contributors and do not necessarily reflect the view of their organisations and/or affiliates

ZENODO

CERN Document Server

A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

Author: Fox Geoffrey C.
Jha Shantenu
Luckow Andre
Mantha Pradeep
Qiu Judy
Publication venue
Publication date: 01/01/2014
Field of study

Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large volumes of data. We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-performance computing and the Apache-Hadoop paradigm. We propose a basis, common terminology and functional factors upon which to analyze the two approaches of both paradigms. We discuss the concept of "Big Data Ogres" and their facets as means of understanding and characterizing the most common application workloads found across the two paradigms. We then discuss the salient features of the two paradigms, and compare and contrast the two approaches. Specifically, we examine common implementation/approaches of these paradigms, shed light upon the reasons for their current "architecture" and discuss some typical workloads that utilize them. In spite of the significant software distinctions, we believe there is architectural similarity. We discuss the potential integration of different implementations, across the different levels and components. Our comparison progresses from a fully qualitative examination of the two paradigms, to a semi-quantitative methodology. We use a simple and broadly used Ogre (K-means clustering), characterize its performance on a range of representative platforms, covering several implementations from both paradigms. Our experiments provide an insight into the relative strengths of the two paradigms. We propose that the set of Ogres will serve as a benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref