16 research outputs found
ecocomDP: A flexible data design pattern for ecological community survey data
The idea of harmonizing data is not new. Decades of amassing data in databases according to community standards - both locally and globally - have been more successful for some research domains than others. It is particularly difficult to harmonize data across studies where sampling protocols vary greatly and complex environmental conditions need to be understood to apply analytical methods correctly. However, a body of longterm ecological community observations is increasingly becoming publicly available and has been used in important studies. Here, we discuss an approach to preparing harmonized community survey data by an environmental data repository, in collaboration with a national observatory. The workflow framework and repository infrastructure are used to create a decentralized, asynchronous model to reformat data without altering original data through cleaning or aggregation, while retaining metadata about sampling methods and provenance, and enabling programmatic data access. This approach does not create another data ‘silo’ but will allow the repository to contribute subsets of available data to a variety of different analysis-ready data preparation efforts. With certain limitations (e.g., changes to the sampling protocol over time), data updates and downstream processing may be completely automated. In addition to supporting reuse of community observation data by synthesis science, a goal for this harmonization and workflow effort is to contribute these datasets to the Global Biodiversity Information Facility (GBIF) to increase the data’s discovery and use
Long-term ecological research in a human-dominated world
Author Posting. © American Institute of Biological Sciences, 2012. This article is posted here by permission of American Institute of Biological Sciences for personal use, not for redistribution. The definitive version was published in BioScience 62 (2012): 342-253, doi:10.1525/bio.2012.62.4.6.The US Long Term Ecological Research (LTER) Network enters its fourth decade with a distinguished record of achievement in ecological science. The value of long-term observations and experiments has never been more important for testing ecological theory and for addressing today's most difficult environmental challenges. The network's potential for tackling emergent continent-scale questions such as cryosphere loss and landscape change is becoming increasingly apparent on the basis of a capacity to combine long-term observations and experimental results with new observatory-based measurements, to study socioecological systems, to advance the use of environmental cyberinfrastructure, to promote environmental science literacy, and to engage with decisionmakers in framing major directions for research. The long-term context of network science, from understanding the past to forecasting the future, provides a valuable perspective for helping to solve many of the crucial environmental problems facing society today.2012-10-0
Generating community-built tools for data sharing and analysis in environmental networks
Rapid data growth in many environmental sectors has necessitated tools to manage and analyze these data. The development of tools often lags behind the proliferation of data, however, which may slow exploratory opportunities and scientific progress. The Global Lake Ecological Observatory Network (GLEON) collaborative model supports an efficient and comprehensive data–analysis–insight life cycle, including implementations of data quality control checks, statistical calculations/derivations, models, and data visualizations. These tools are community-built and openly shared. We discuss the network structure that enables tool development and a culture of sharing, leading to optimized output from limited resources. Specifically, data sharing and a flat collaborative structure encourage the development of tools that enable scientific insights from these data. Here we provide a cross-section of scientific advances derived from global-scale analyses in GLEON. We document enhancements to science capabilities made possible by the development of analytical tools and highlight opportunities to expand this framework to benefit other environmental networks
The Tao of open science for ecology
The field of ecology is poised to take advantage of emerging technologies that facilitate the gathering, analyzing, and sharing of data, methods, and results. The concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers, constitutes “open science.” Despite the many benefits of an open approach to science, a number of barriers to entry exist that may prevent researchers from embracing openness in their own work. Here we describe several key shifts in mindset that underpin the transition to more open science. These shifts in mindset include thinking about data stewardship rather than data ownership, embracing transparency throughout the data life‐cycle and project duration, and accepting critique in public. Though foreign and perhaps frightening at first, these changes in thinking stand to benefit the field of ecology by fostering collegiality and broadening access to data and findings. We present an overview of tools and best practices that can enable these shifts in mindset at each stage of the research process, including tools to support data management planning and reproducible analyses, strategies for soliciting constructive feedback throughout the research process, and methods of broadening access to final research products
Recommended from our members
BioTIME: A database of biodiversity time series for the Anthropocene.
MotivationThe BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. These data enable users to calculate temporal trends in biodiversity within and amongst assemblages using a broad range of metrics. BioTIME is being developed as a community-led open-source database of biodiversity time series. Our goal is to accelerate and facilitate quantitative analysis of temporal patterns of biodiversity in the Anthropocene.Main types of variables includedThe database contains 8,777,413 species abundance records, from assemblages consistently sampled for a minimum of 2 years, which need not necessarily be consecutive. In addition, the database contains metadata relating to sampling methodology and contextual information about each record.Spatial location and grainBioTIME is a global database of 547,161 unique sampling locations spanning the marine, freshwater and terrestrial realms. Grain size varies across datasets from 0.0000000158 km2 (158 cm2) to 100 km2 (1,000,000,000,000 cm2).Time period and grainBioTIME records span from 1874 to 2016. The minimal temporal grain across all datasets in BioTIME is a year.Major taxa and level of measurementBioTIME includes data from 44,440 species across the plant and animal kingdoms, ranging from plants, plankton and terrestrial invertebrates to small and large vertebrates.Software format.csv and .SQL
Recommended from our members
Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km² ). LAGOS includes two modules: LAGOS[subscript]GEO , with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOS[subscript]LIMNO , with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.Keywords: LAGOS, Integrated database, Data harmonization, Database
Ecoinformatics, Macrosystems ecology, Landscape limnology, Water qualityKeywords: LAGOS, Integrated database, Ecoinformatics, Data harmonization, Water quality, Data sharing, Landscape limnology, Macrosystems ecology, Database documentation, Data reus
The Tao of open science for ecology
The field of ecology is poised to take advantage of emerging technologies that facilitate the gathering, analyzing, and sharing of data, methods, and results. The concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers, constitutes "open science." Despite the many benefits of an open approach to science, a number of barriers to entry exist that may prevent researchers from embracing openness in their own work. Here we describe several key shifts in mindset that underpin the transition to more open science. These shifts in mindset include thinking about data stewardship rather than data ownership, embracing transparency throughout the data life-cycle and project duration, and accepting critique in public. Though foreign and perhaps frightening at first, these changes in thinking stand to benefit the field of ecology by fostering collegiality and broadening access to data and findings. We present an overview of tools and best practices that can enable these shifts in mindset at each stage of the research process, including tools to support data management planning and reproducible analyses, strategies for soliciting constructive feedback throughout the research process, and methods of broadening access to final research products
Completing the data life cycle: using information management in macrosystems ecology research
An important goal of macrosystems ecology (MSE) research is to advance understanding of ecological systems at both fine and broad temporal and spatial scales. Our premise in this paper is that MSE projects require integrated information management at their inception. Such efforts will lead to improved communication and sharing of knowledge among diverse project participants, better science outcomes, and more transparent and accessible (ie “open”) science. We encourage researchers to “complete the data life cycle” by publishing well-documented datasets, thereby facilitating re-use of the data to answer new and different questions from the ones conceived by those involved in the original projects. The practice of documenting and submitting datasets to data repositories that are publicly accessible ensures that research results and data are available to and use-able by other researchers, thus fostering open science. However, ecologists are often unfamiliar with the requirements and information management tools for effectively preserving data and receive little institutional or professional incentive to do so. Here, we provide recommendations for achieving these ends and give examples from current MSE projects to demonstrate why information management is critical for ensuring that scientific results can be reproduced and that data can be shared for future use
Ecology under lake ice
Winter conditions are rapidly changing in temperate ecosystems, particularly for those that experi-ence periods of snow and ice cover. Relatively little is known of winter ecology in these systems,due to a historical research focus on summer ‘growing seasons’. We executed the first global quan-titative synthesis on under-ice lake ecology, including 36 abiotic and biotic variables from 42research groups and 101 lakes, examining seasonal differences and connections as well as how sea-sonal differences vary with geophysical factors. Plankton were more abundant under ice thanexpected; mean winter values were 43.2% of summer values for chlorophyll a, 15.8% of summerphytoplankton biovolume and 25.3% of summer zooplankton density. Dissolved nitrogen concen-trations were typically higher during winter, and these differences were exaggerated in smallerlakes. Lake size also influenced winter-summer patterns for dissolved organic carbon (DOC), withhigher winter DOC in smaller lakes. At coarse levels of taxonomic aggregation, phytoplanktonand zooplankton community composition showed few systematic differences between seasons,although literature suggests that seasonal differences are frequently lake-specific, species-specific,or occur at the level of functional group. Within the subset of lakes that had longer time series,winter influenced the subsequent summer for some nutrient variables and zooplankton biomas