Search CORE

13,159 research outputs found

Big Data in HEP: A comprehensive use case study

Author: Cremonesi Matteo
Elmer Peter
Gutsche Oliver
Jayatilaka Bo
Kowalkowski Jim
Pivarski Jim
Sehrish Saba
Surez Cristina Mantilla
Svyatkovskiy Alexey
Tran Nhan
Publication venue: 'IOP Publishing'
Publication date: 12/03/2017
Field of study

Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.Comment: Proceedings for 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016

arXiv.org e-Print Archive

CERN Document Server

The space physics environment data analysis system (SPEDAS)

Author: Angelopoulos V.
Cruce P.
Drozdov A.
Grimes E. W.
Hatzigeorgiu N.
King D. A.
Larson D.
Lewis J. W.
McTiernan J. M.
Roberts D. A.
Walsh Brian M.
Publication venue: Springer Netherlands
Publication date: 01/01/2019
Field of study

With the advent of the Heliophysics/Geospace System Observatory (H/GSO), a complement of multi-spacecraft missions and ground-based observatories to study the space environment, data retrieval, analysis, and visualization of space physics data can be daunting. The Space Physics Environment Data Analysis System (SPEDAS), a grass-roots software development platform (www.spedas.org), is now officially supported by NASA Heliophysics as part of its data environment infrastructure. It serves more than a dozen space missions and ground observatories and can integrate the full complement of past and upcoming space physics missions with minimal resources, following clear, simple, and well-proven guidelines. Free, modular and configurable to the needs of individual missions, it works in both command-line (ideal for experienced users) and Graphical User Interface (GUI) mode (reducing the learning curve for first-time users). Both options have “crib-sheets,” user-command sequences in ASCII format that can facilitate record-and-repeat actions, especially for complex operations and plotting. Crib-sheets enhance scientific interactions, as users can move rapidly and accurately from exchanges of technical information on data processing to efficient discussions regarding data interpretation and science. SPEDAS can readily query and ingest all International Solar Terrestrial Physics (ISTP)-compatible products from the Space Physics Data Facility (SPDF), enabling access to a vast collection of historic and current mission data. The planned incorporation of Heliophysics Application Programmer’s Interface (HAPI) standards will facilitate data ingestion from distributed datasets that adhere to these standards. Although SPEDAS is currently Interactive Data Language (IDL)-based (and interfaces to Java-based tools such as Autoplot), efforts are under-way to expand it further to work with python (first as an interface tool and potentially even receiving an under-the-hood replacement). We review the SPEDAS development history, goals, and current implementation. We explain its “modes of use” with examples geared for users and outline its technical implementation and requirements with software developers in mind. We also describe SPEDAS personnel and software management, interfaces with other organizations, resources and support structure available to the community, and future development plans.Published versio

Boston University Institutional Repository (OpenBU)

Using Data in Undergraduate Science Classrooms

Author: Lamont-Doherty Earth Observatory Columbia University
Publication venue: Columbia University
Publication date: 10/09/2007
Field of study

Provides pedagogical insight concerning the skill of using data The resource being annotated is: http://www.dlese.org/dds/catalog_DATA-CLASS-000-000-000-007.htm

Digital Library for Earth System Education

From Social Simulation to Integrative System Design

Author: Balietti Stefano
Helbing Dirk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

As the recent financial crisis showed, today there is a strong need to gain "ecological perspective" of all relevant interactions in socio-economic-techno-environmental systems. For this, we suggested to set-up a network of Centers for integrative systems design, which shall be able to run all potentially relevant scenarios, identify causality chains, explore feedback and cascading effects for a number of model variants, and determine the reliability of their implications (given the validity of the underlying models). They will be able to detect possible negative side effect of policy decisions, before they occur. The Centers belonging to this network of Integrative Systems Design Centers would be focused on a particular field, but they would be part of an attempt to eventually cover all relevant areas of society and economy and integrate them within a "Living Earth Simulator". The results of all research activities of such Centers would be turned into informative input for political Decision Arenas. For example, Crisis Observatories (for financial instabilities, shortages of resources, environmental change, conflict, spreading of diseases, etc.) would be connected with such Decision Arenas for the purpose of visualization, in order to make complex interdependencies understandable to scientists, decision-makers, and the general public.Comment: 34 pages, Visioneer White Paper, see http://www.visioneer.ethz.c

arXiv.org e-Print Archive

Repository for Publications and Research Data

CiteSeerX

EDP Sciences OAI-PMH repository (1.2.0)

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

Visualization of large molecular trajectories

Author: Duran Rosich David
Hermosilla Casajús Pedro
Kozliková Barbora
Ropinski Timo
Vinacua Pla Álvaro
Vázquez Alcocer Pere Pau
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The analysis of protein-ligand interactions is a time-intensive task. Researchers have to analyze multiple physico-chemical properties of the protein at once and combine them to derive conclusions about the protein-ligand interplay. Typically, several charts are inspected, and 3D animations can be played side-by-side to obtain a deeper understanding of the data. With the advances in simulation techniques, larger and larger datasets are available, with up to hundreds of thousands of steps. Unfortunately, such large trajectories are very difficult to investigate with traditional approaches. Therefore, the need for special tools that facilitate inspection of these large trajectories becomes substantial. In this paper, we present a novel system for visual exploration of very large trajectories in an interactive and user-friendly way. Several visualization motifs are automatically derived from the data to give the user the information about interactions between protein and ligand. Our system offers specialized widgets to ease and accelerate data inspection and navigation to interesting parts of the simulation. The system is suitable also for simulations where multiple ligands are involved. We have tested the usefulness of our tool on a set of datasets obtained from protein engineers, and we describe the expert feedback.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Beyond XSPEC: Towards Highly Configurable Analysis

Author: Collins H. M.
Gilat A.
Gropp W.
Houck J. C.
Kiczales G.
M. A. Nowak
M. S. Noble
Publication venue: 'University of Chicago Press'
Publication date: 03/06/2008
Field of study

We present a quantitative comparison between software features of the defacto standard X-ray spectral analysis tool, XSPEC, and ISIS, the Interactive Spectral Interpretation System. Our emphasis is on customized analysis, with ISIS offered as a strong example of configurable software. While noting that XSPEC has been of immense value to astronomers, and that its scientific core is moderately extensible--most commonly via the inclusion of user contributed "local models"--we identify a series of limitations with its use beyond conventional spectral modeling. We argue that from the viewpoint of the astronomical user, the XSPEC internal structure presents a Black Box Problem, with many of its important features hidden from the top-level interface, thus discouraging user customization. Drawing from examples in custom modeling, numerical analysis, parallel computation, visualization, data management, and automated code generation, we show how a numerically scriptable, modular, and extensible analysis platform such as ISIS facilitates many forms of advanced astrophysical inquiry.Comment: Accepted by PASP, for July 2008 (15 pages

arXiv.org e-Print Archive

Crossref

Using Modern Machine Learning Methods on KASCADE Data for Outreach and Education

Author: Kostunin Dmitriy
Plokhikh Ivan
Sotnikov Vladimir
Tokareva Victoria
Publication venue: Scuola Internazionale Superiore di Studi Avanzati
Publication date: 16/12/2021
Field of study

KITopen