50 research outputs found
If these data could talk
In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientfic fields exhibit distressingly low rates of repeatability and reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and effciency of reporting, which contributes to issues of reproducibility. Data provenance
aids both repeatability and reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers.Engineering and Applied SciencesOrganismic and Evolutionary Biolog
Sedimentary evidence of hurricane strikes in western Long Island, New York
Author Posting. © American Geophysical Union, 2007. This article is posted here by permission of American Geophysical Union for personal use, not for redistribution. The definitive version was published in Geochemistry Geophysics Geosystems 8 (2007): Q06011, doi:10.1029/2006GC001463.Evidence of historical landfalling hurricanes and prehistoric storms has been recovered from backbarrier environments in the New York City area. Overwash deposits correlate with landfalls of the most intense documented hurricanes in the area, including the hurricanes of 1893, 1821, 1788, and 1693 A.D. There is little evidence of intense hurricane landfalls in the region for several hundred years prior to the late 17th century A.D. The apparent increase in intense hurricane landfalls around 300 years ago occurs during the latter half of the Little Ice Age, a time of lower tropical sea surface temperatures. Multiple washovers laid down between ~2200 and 900 cal yr B.P. suggest an interval of frequent intense hurricane landfalls in the region. Our results provide preliminary evidence that fluctuations in intense hurricane landfall in the northeastern United States were roughly synchronous with hurricane landfall fluctuations observed for the Caribbean and Gulf Coast, suggesting North Atlanticâwide changes in hurricane activity.Grants from the National Science Foundation (EAR
0519118), Risk Prediction Initiative at the Bermuda Biological
Station for Research, and the Coastal Ocean Institute of Woods
Hole Oceanographic Institution supported this research
Carbon budget of the Harvard Forest Long- Term Ecological Research site: pattern, process, and response to global change
How, where, and why carbon (C) moves into and out of an ecosystem through time are long- standing questions in biogeochemistry. Here, we bring together hundreds of thousands of C- cycle observations at the Harvard Forest in central Massachusetts, USA, a mid- latitude landscape dominated by 80- 120- yr- old closed- canopy forests. These data answered four questions: (1) where and how much C is presently stored in dominant forest types; (2) what are current rates of C accrual and loss; (3) what biotic and abiotic factors contribute to variability in these rates; and (4) how has climate change affected the forest- s C cycle? Harvard Forest is an active C sink resulting from forest regrowth following land abandonment. Soil and tree biomass comprise nearly equal portions of existing C stocks. Net primary production (NPP) averaged 680- 750Ă g CĂ·m- 2Ă·yr- 1; belowground NPP contributed 38- 47% of the total, but with large uncertainty. Mineral soil C measured in the same inventory plots in 1992 and 2013 was too heterogeneous to detect change in soil- C pools; however, radiocarbon data suggest a small but persistent sink of 10- 30Ă g CĂ·m- 2Ă·yr- 1. Net ecosystem production (NEP) in hardwood stands averaged ~300Ă g CĂ·m- 2Ă·yr- 1. NEP in hemlock- dominated forests averaged ~450Ă g CĂ·m- 2Ă·yr- 1 until infestation by the hemlock woolly adelgid turned these stands into a net C source. Since 2000, NPP has increased by 26%. For the period 1992- 2015, NEP increased 93%. The increase in mean annual temperature and growing season length alone accounted for ~30% of the increase in productivity. Interannual variations in GPP and NEP were also correlated with increases in red oak biomass, forest leaf area, and canopy- scale light- use efficiency. Compared to long- term global change experiments at the Harvard Forest, the C sink in regrowing biomass equaled or exceeded C cycle modifications imposed by soil warming, N saturation, and hemlock removal. Results of this synthesis and comparison to simulation models suggest that forests across the region are likely to accrue C for decades to come but may be disrupted if the frequency or severity of biotic and abiotic disturbances increases.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163495/3/ecm1423_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163495/2/ecm1423-sup-0001-AppendixS1.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163495/1/ecm1423.pd
POSTER: RDataTracker and DDG Explorer Capture, Visualization and Querying of Provenance from R Scripts
Scientific data provenance is gaining interest among both scientists and computer scientists. additional benefits to adopting these systems, they present a hurdle to scientists who are more interested in focusing on science than in learning new technologies. The work described in this poster is aimed at exploring the extent to which we can support scientists while expecting a minimal investment in terms of additional effort on their part. This work has been developed in collaboration with ecologists at Harvard Forest, a 3500 acre facility operated by Harvard University and serving as a Long-Term Ecological Research (LTER) site funded by the National Science Foundation. Many of these ecologists perform data analysis using R, a widely used scripting language that includes extensive statistical analysis and plotting functionality. These scientists are committed to understanding their data, making sure that their data analyses are done in an appropriate manner, and sharing their data and results with others. For these reasons, they appreciate the value that collecting data provenance may have, but they are not enthusiastic about learning new tools. In this poster, we present two tools aimed at this audience: RDataTracker and DDG Explorer. RDataTracker [LB14] is used to collect data provenance during the execution of an R script. DDG Explorer is the tool that is used to examine and query the resulting data provenance. Capturing Data Provenance with RDataTracker RDataTracker is an R library that contains functions to build a provenance graph based on the execution of an R script and/or user activity in the R console. At a minimum, the scientist needs to load the library, initialize the provenance graph at the start of execution, and save the provenance graph at the end. As a script executes or the user enters commands at the console, a provenance graph is constructed that records the operations that are executed, the data that are used, and where variables are assigned. The user can increase the amount of information collected during execution by including more instrumentation. In particular, by doing this the user can: -Save copies of input and output files as well as copies of plots created
Using Introspection to Collect Provenance in R
Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using Râs powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility
Recommended from our members
Hurricane Impacts to Tropical and Temperate Forest Landscapes
Hurricanes represent an important natural disturbance process to tropical and temperate forests in many coastal areas of the world. The complex patterns of damage created in forests by hurricane winds result from the interaction of meteorological, physiographic, and biotic factors on a range of spatial scales. To improve our understanding of these factors and of the role of catastrophic hurricane wind as a disturbance process, we take an integrative approach. A simple meteorological model (HURRECON) utilizes meteorological data to reconstruct wind conditions at specific sites and regional gradients in wind speed and direction during a hurricane. A simple topographic exposure model (EXPOS) utilizes wind direction predicted by HURRECON and a digital elevation map to estimate landscapeâlevel exposure to the strongest winds. Actual damage to forest stands is assessed through analysis of remotely sensed, historical, and field data. These techniques were used to evaluate the characteristics and impacts of two important hurricanes: Hurricane Hugo (1989) in Puerto Rico and the 1938 New England Hurricane, storms of comparable magnitude in regions that differ greatly in climate, vegetation, physiography, and disturbance regimes. In both cases patterns of damage on a regional scale were found to agree with the predicted distribution of peak wind gust velocities. On a landscape there was also good agreement between patterns of forest damage and predicted exposure in the Luquillo Experimental Forest in Puerto Rico and the town of Petersham, Massachusetts. At the Harvard and Pisgah Forests in central New England the average orientation of windâthrown trees was very close to the predicted peak wind direction, while at Luquillo there was also good agreement, with some apparent modification of wind direction by the mountainous terrain. At Harvard Forest there was evidence that trees more susceptible to windthrow were felled earlier in the storm. This approach may be used to study the effects of topography on wind direction and the relation of forest damage to wind speed and duration; to establish broadâscale gradients of hurricane frequency, intensity, and wind direction for particular regions; and to determine landscapeâlevel exposure to longâterm hurricane disturbance at particular sites.Organismic and Evolutionary Biolog
Recommended from our members
Landscape and Regional Impacts of Hurricanes in New England
Hurricanes are a major factor controlling ecosystem structure, function, and dynamics in many coastal forests, but their ecological role can be understood only by assessing impacts in space and time over a period of centuries. We present a new method for reconstructing hurricane disturbance regimes using a combination of historical research and computer modeling. Historical evidence of wind damage for each hurricane in the selected region is quantified using the Fujita scale to produce regional maps of actual damage. A simple meteorological model (HURRECON), parameterized and tested for selected recent hurricanes, provides regional estimates of wind speed, direction, and damage for each storm. Individual reconstructions are compiled to analyze spatial and temporal patterns of hurricane impacts. Long-term effects of topography on a landscape scale are then simulated with a simple topographic exposure model (EXPOS). We applied this method to the region of New England, USA, examining hurricanes since European settlement in 1620. Results showed strong regional gradients in hurricane frequency and intensity from southeast to northwest: mean return intervals for F0 damage on the Fujita scale (loss of leaves and branches) ranged from 5 to 85 yr, mean return intervals for F1 damage (scattered blowdowns, small gaps) ranged from 10 to .200 yr, and mean return intervals for F2 damage (extensive blowdowns, large gaps) ranged from 85 to .380 yr. On a landscape scale, mean return intervals for F2 damage in the town of Petersham, Massachusetts, ranged from 125 yr across most sites to .380 yr on scattered lee slopes. Actual forest damage was strongly dependent on land use and natural disturbance history. Annual and decadal timing of hurricanes varied widely. There was no clear centuryscale trend in the number of major hurricanes. The historical-modeling approach is applicable to any region with good historical records and will enable ecologists and land managers to incorporate insights on hurricane disturbance regimes into the interpretation and conservation of forests at landscape to regional scales.Organismic and Evolutionary Biolog