11 research outputs found

    Hadoop for High-Performance Climate Analytics: Use Cases and Lessons Learned

    Get PDF
    Scientific data services are a critical aspect of the NASA Center for Climate Simulations mission (NCCS). Hadoop, via MapReduce, provides an approach to high-performance analytics that is proving to be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. The NCCS is particularly interested in the potential of Hadoop to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we prototyped a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. The initial focus was on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. After preliminary results suggested that this approach improves efficiencies within data intensive analytic workflows, we invested in building a cyber infrastructure resource for developing a new generation of climate data analysis capabilities using Hadoop. This resource is focused on reducing the time spent in the preparation of reanalysis data used in data-model inter-comparison, a long sought goal of the climate community. This paper summarizes the related use cases and lessons learned

    MERRA/AS: The MERRA Analytic Services Project Interim Report

    Get PDF
    MERRA AS is a cyberinfrastructure resource that will combine iRODS-based Climate Data Server (CDS) capabilities with Coudera MapReduce to serve MERRA analytic products, store the MERRA reanalysis data collection in an HDFS to enable parallel, high-performance, storage-side data reductions, manage storage-side driver, mapper, reducer code sets and realized objects for users, and provide a library of commonly used spatiotemporal operations that can be composed to enable higher-order analyses

    The Virtual Climate Data Server (vCDS): An iRODS-Based Data Management Software Appliance Supporting Climate Data Services and Virtualization-as-a-Service in the NASA Center for Climate Simulation

    Get PDF
    Scientific data services are becoming an important part of the NASA Center for Climate Simulation's mission. Our technological response to this expanding role is built around the concept of a Virtual Climate Data Server (vCDS), repetitive provisioning, image-based deployment and distribution, and virtualization-as-a-service. The vCDS is an iRODS-based data server specialized to the needs of a particular data-centric application. We use RPM scripts to build vCDS images in our local computing environment, our local Virtual Machine Environment, NASA s Nebula Cloud Services, and Amazon's Elastic Compute Cloud. Once provisioned into one or more of these virtualized resource classes, vCDSs can use iRODS s federation capabilities to create an integrated ecosystem of managed collections that is scalable and adaptable to changing resource requirements. This approach enables platform- or software-asa- service deployment of vCDS and allows the NCCS to offer virtualization-as-a-service: a capacity to respond in an agile way to new customer requests for data services

    MERRA Analytic Services: Meeting the Big Data Challenges of Climate Science Through Cloud-enabled Climate Analytics-as-a-service

    Get PDF
    Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we it see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRAAS) is an example of cloud-enabled CAaaS built on this principle. MERRAAS enables MapReduce analytics over NASAs Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRAAS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRAAS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data

    System and Method for Providing a Climate Data Persistence Service

    Get PDF
    A system, method and computer-readable storage devices for providing a climate data persistence service. A system configured to provide the service can include a climate data server that performs data and metadata storage and management functions for climate data objects, a compute-storage platform that provides the resources needed to support a climate data server, provisioning software that allows climate data server instances to be deployed as virtual climate data servers in a cloud computing environment, and a service interface, wherein persistence service capabilities are invoked by software applications running on a client device. The climate data objects can be in various formats, such as International Organization for Standards (ISO) Open Archival Information System (OAIS) Reference Model Submission Information Packages, Archive Information Packages, and Dissemination Information Packages. The climate data server can enable scalable, federated storage, management, discovery, and access, and can be tailored for particular use cases

    Big Data Challenges in Climate Science: Improving the Next-Generation Cyberinfrastructure

    Get PDF
    The knowledge we gain from research in climate science depends on the generation, dissemination, and analysis of high-quality data. This work comprises technical practice as well as social practice, both of which are distinguished by their massive scale and global reach. As a result, the amount of data involved in climate research is growing at an unprecedented rate. Climate model intercomparison (CMIP) experiments, the integration of observational data and climate reanalysis data with climate model outputs, as seen in the Obs4MIPs, Ana4MIPs, and CREATE-IP activities, and the collaborative work of the Intergovernmental Panel on Climate Change (IPCC) provide examples of the types of activities that increasingly require an improved cyberinfrastructure for dealing with large amounts of critical scientific data. This paper provides an overview of some of climate science's big data problems and the technical solutions being developed to advance data publication, climate analytics as a service, and interoperability within the Earth System Grid Federation (ESGF), the primary cyberinfrastructure currently supporting global climate research activities

    Toward a Monte Carlo approach to selecting climate variables in MaxEnt.

    No full text
    MaxEnt is an important aid in understanding the influence of climate change on species distributions. There is growing interest in using IPCC-class global climate model outputs as environmental predictors in this work. These models provide realistic, global representations of the climate system, projections for hundreds of variables (including Essential Climate Variables), and combine observations from an array of satellite, airborne, and in-situ sensors. Unfortunately, direct use of this important class of data in MaxEnt modeling has been limited by the large size of climate model output collections and the fact that MaxEnt can only operate on a relatively small set of predictors stored in a computer's main memory. In this study, we demonstrate the feasibility of a Monte Carlo method that overcomes this limitation by finding a useful subset of predictors in a larger, externally-stored collection of environmental variables in a reasonable amount of time. Our proposed solution takes an ensemble approach wherein many MaxEnt runs, each drawing on a small random subset of variables, converges on a global estimate of the top contributing subset of variables in the larger collection. In preliminary tests, the Monte Carlo approach selected a consistent set of top six variables within 540 runs, with the four most contributory variables of the top six accounting for approximately 93% of overall permutation importance in a final model. These results suggest that a Monte Carlo approach could offer a viable means of screening environmental predictors prior to final model construction that is amenable to parallelization and scalable to very large data sets. This points to the possibility of near-real-time multiprocessor implementations that could enable broader and more exploratory use of global climate model outputs in environmental niche modeling and aid in the discovery of viable predictors

    White matter deficits in psychopathic offenders and correlation with factor structure

    Get PDF
    Contains fulltext : 119541.pdf (publisher's version ) (Open Access)Psychopathic offenders show a persistent pattern of emotional unresponsivity to the often horrendous crimes they perpetrate. Recent studies have related psychopathy to alterations in white matter. Therefore, diffusion tensor imaging followed by tract-based spatial statistics (TBSS) analysis in 11 psychopathic offenders matched to 11 healthy controls was completed. Fractional anisotropy was calculated within each voxel and comparisons were made between groups using a permutation test. Any clusters of white matter voxels different between groups were submitted to probabilistic tractography. Significant differences in fractional anisotropy were found between psychopathic offenders and healthy controls in three main white matter clusters. These three clusters represented two major networks: an amygdalo-prefrontal network, and a striato-thalamo-frontal network. The interpersonal/affective component of the PCL-R correlated with white matter deficits in the orbitofrontal cortex and frontal pole whereas the antisocial component correlated with deficits in the striato-thalamo-frontal network. In addition to replicating earlier work concerning disruption of an amygdala-prefrontal network, we show for the first time that white matter integrity in a striato-thalamo-frontal network is disrupted in psychopathic offenders. The novelty of our findings lies in the two dissociable white matter networks that map directly onto the two major factors of psychopathy.8 p
    corecore