Search CORE

36 research outputs found

ecocomDP: A flexible data design pattern for ecological community survey data

Author: Castorani Max C. N.
Gries Corinna
Lany Nina
O\u27Brien Margaret
Record Sydne
Smith Colin A.
Sokol Eric R.
Publication venue: Scholarship, Research, and Creative Work at Bryn Mawr College
Publication date: 01/01/2021
Field of study

The idea of harmonizing data is not new. Decades of amassing data in databases according to community standards - both locally and globally - have been more successful for some research domains than others. It is particularly difficult to harmonize data across studies where sampling protocols vary greatly and complex environmental conditions need to be understood to apply analytical methods correctly. However, a body of longterm ecological community observations is increasingly becoming publicly available and has been used in important studies. Here, we discuss an approach to preparing harmonized community survey data by an environmental data repository, in collaboration with a national observatory. The workflow framework and repository infrastructure are used to create a decentralized, asynchronous model to reformat data without altering original data through cleaning or aggregation, while retaining metadata about sampling methods and provenance, and enabling programmatic data access. This approach does not create another data ‘silo’ but will allow the repository to contribute subsets of available data to a variety of different analysis-ready data preparation efforts. With certain limitations (e.g., changes to the sampling protocol over time), data updates and downstream processing may be completely automated. In addition to supporting reuse of community observation data by synthesis science, a goal for this harmonization and workflow effort is to contribute these datasets to the Global Biodiversity Information Facility (GBIF) to increase the data’s discovery and use

Scholarship, Research, and Creative Work at Bryn Mawr College | Bryn Mawr College Research

A global database of lake surface temperatures collected by in situ and satellite methods from 1985–2009

Author: Adrian Rita
Gray Derek K.
Gries Corinna
Hampton Stephanie E. [u.v.m.]
O’Reilly Catherine M.
Qudrat Anam
Read Jordan S.
Schneider Philipp
Sharma Sapna
Stefanoff Samantha
Publication venue
Publication date: 01/01/2015
Field of study

Global environmental change has influenced lake surface temperatures, a key driver of ecosystem structure and function. Recent studies have suggested significant warming of water temperatures in individual lakes across many different regions around the world. However, the spatial and temporal coherence associated with the magnitude of these trends remains unclear. Thus, a global data set of water temperature is required to understand and synthesize global, long-term trends in surface water temperatures of inland bodies of water. We assembled a database of summer lake surface temperatures for 291 lakes collected in situ and/or by satellites for the period 1985–2009. In addition, corresponding climatic drivers (air temperatures, solar radiation, and cloud cover) and geomorphometric characteristics (latitude, longitude, elevation, lake surface area, maximum depth, mean depth, and volume) that influence lake surface temperatures were compiled for each lake. This unique dataset offers an invaluable baseline perspective on global-scale lake thermal conditions as environmental change continues

KOPS - The Institutional Repository of the University of Konstanz

Institutional Repository of the Freie Universität Berlin

Facilitating and Improving Environmental Research Data Repository Interoperability

Author: Amber Budden
Christine Laney
Corinna Gries
David Vieglais
Kristin Vanderbilt
Margaret O'Brien
Mark Servilla
Wade Sheldon
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/09/2018
Field of study

Environmental research data repositories provide much needed services for data preservation and data dissemination to diverse communities with domain specific or programmatic data needs and standards. Due to independent development these repositories serve their communities well, but were developed with different technologies, data models and using different ontologies. Hence, the effectiveness and efficiency of these services can be vastly improved if repositories work together adhering to a shared community platform that focuses on the implementation of agreed upon standards and best practices for curation and dissemination of data. Such a community platform drives forward the convergence of technologies and practices that will advance cross-domain interoperability. It will also facilitate contributions from investigators through standardized and streamlined workflows and provide increased visibility for the role of data managers and the curation services provided by data repositories, beyond preservation infrastructure. Ten specific suggestions for such standardizations are outlined without any suggestions for priority or technical implementation. Although the recommendations are for repositories to implement, they have been chosen specifically with the data provider/data curator and synthesis scientist in mind

Directory of Open Access Journals

Long-term ecological research in a human-dominated world

Author: Brokaw Nicholas
Collins Scott L.
Ducklow Hugh W.
Foster David R.
Gragson Ted L.
Gries Corinna
Hamilton Stephen K.
McGuire A. David
Moore John C.
Robertson G. Philip
Stanley Emily H.
Waide Robert B.
Williams Mark W.
Publication venue: 'University of California Press'
Publication date: 01/04/2012
Field of study

Author Posting. © American Institute of Biological Sciences, 2012. This article is posted here by permission of American Institute of Biological Sciences for personal use, not for redistribution. The definitive version was published in BioScience 62 (2012): 342-253, doi:10.1525/bio.2012.62.4.6.The US Long Term Ecological Research (LTER) Network enters its fourth decade with a distinguished record of achievement in ecological science. The value of long-term observations and experiments has never been more important for testing ecological theory and for addressing today's most difficult environmental challenges. The network's potential for tackling emergent continent-scale questions such as cryosphere loss and landscape change is becoming increasingly apparent on the basis of a capacity to combine long-term observations and experimental results with new observatory-based measurements, to study socioecological systems, to advance the use of environmental cyberinfrastructure, to promote environmental science literacy, and to engage with decisionmakers in framing major directions for research. The long-term context of network science, from understanding the past to forecasting the future, provides a valuable perspective for helping to solve many of the crucial environmental problems facing society today.2012-10-0

Woods Hole Open Access Server

Generating community-built tools for data sharing and analysis in environmental networks

Author: Catherine O'Reilley
Christopher McBride
Corinna Gries
David Hamilton
Don Pierson
Eleanor Jennings
Emily K Read
Jennifer Klug
Jordan S Read
Luke A Winslow
Matthew Hipsey
Paul C Hanson
Publication venue: Freshwater Biological Association
Publication date
Field of study

Rapid data growth in many environmental sectors has necessitated tools to manage and analyze these data. The development of tools often lags behind the proliferation of data, however, which may slow exploratory opportunities and scientific progress. The Global Lake Ecological Observatory Network (GLEON) collaborative model supports an efficient and comprehensive data–analysis–insight life cycle, including implementations of data quality control checks, statistical calculations/derivations, models, and data visualizations. These tools are community-built and openly shared. We discuss the network structure that enables tool development and a culture of sharing, leading to optimized output from limited resources. Specifically, data sharing and a flat collaborative structure encourage the development of tools that enable scientific insights from these data. Here we provide a cross-section of scientific advances derived from global-scale analyses in GLEON. We document enhancements to science capabilities made possible by the development of analytical tools and highlight opportunities to expand this framework to benefit other environmental networks

Recommended from our members

BioTIME: A database of biodiversity time series for the Anthropocene.

Author: Adam Dušan
Akhmetzhanova Asem A.
Antão Laura H.
Appeltans Ward
Arcos José Manuel
Arnold Haley
Ayyappan Narayanan
Badihi Gal
Baird Andrew H.
Barbosa Miguel
Barreto Tiago Egydio
Bates Amanda E.
Bellgrove Alecia
Belmaker Jonathan
Benedetti-Cecchi Lisandro
Bett Brian J.
Bjorkman Anne D.
Bloch Christopher P.
Blowes Shane A.
Bonebrake Timothy C.
Boyd Susan
Bradford Matt
Brooks Andrew J.
Brown James H.
Bruelheide Helge
Budy Phaedra
Bässler Claus
Błażewicz Magdalena
Carvalho Fernando
Castañeda-Moya Edward
Cazzolla Gatti Roberto
Chamblee John F.
Chase Tory J.
Chen Chaolun Allen
Collinge Sharon K.
Condit Richard
Cooper Elisabeth J.
Cornelissen J. Hans C.
Cotano Unai
Daher Correa Franco Geraldo Antonio
Damasceno Gabriella
Davies Claire H.
Davis Robert A.
Day Frank P.
Degraer Steven
Doherty Tim S.
Dornelas Maria
Duffy J. Emmett
Dunn Timothy E.
Durigan Giselda
Edelist Dor
Edgar Graham J.
Elahi Robin
Elmendorf Sarah C.
Enemar Anders
Ernest S. K. Morgan
Escribano Rubén
Estiarte Marc
Evans Brian S.
Fan Tung-Yung
Farneda Fábio Z.
Fidelis Alessandra
Fitt Robert
Fosaa Anna Maria
Frank Grace E.
Fraser William R.
García Hernando
Givan Or
Gorgone-Barbosa Elizabeth
Gould William A.
Gries Corinna
Grossman Gary D.
Gutierréz Julio R.
Hale Stephen
Harmon Mark E.
Harte John
Haskins Gary
Henshaw Donald L.
Hermanutz Luise
Hidalgo Pamela
Higuchi Pedro
Hoey Andrew
Hofgaard Annika
Holeck Kristen
Hollister Robert D.
Holmes Richard
Hoogenboom Mia
Hsieh Chih-hao
Hubbell Stephen P.
Huettmann Falk
Huffard Christine L.
Hurlbert Allen H.
Kyle Crow Shannan
Loureiro Fernandes Luiz
Macedo Ivanauskas Natália
Magurran Anne E.
Moyes Faye
Oliver Jeffrey C.
Siegwart Collier Laura
Turini Farah Fabiano
Van Hoey Gert
Publication venue: eScholarship, University of California
Publication date: 01/07/2018
Field of study

MotivationThe BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. These data enable users to calculate temporal trends in biodiversity within and amongst assemblages using a broad range of metrics. BioTIME is being developed as a community-led open-source database of biodiversity time series. Our goal is to accelerate and facilitate quantitative analysis of temporal patterns of biodiversity in the Anthropocene.Main types of variables includedThe database contains 8,777,413 species abundance records, from assemblages consistently sampled for a minimum of 2 years, which need not necessarily be consecutive. In addition, the database contains metadata relating to sampling methodology and contextual information about each record.Spatial location and grainBioTIME is a global database of 547,161 unique sampling locations spanning the marine, freshwater and terrestrial realms. Grain size varies across datasets from 0.0000000158 km2 (158 cm2) to 100 km2 (1,000,000,000,000 cm2).Time period and grainBioTIME records span from 1874 to 2016. The minimal temporal grain across all datasets in BioTIME is a year.Major taxa and level of measurementBioTIME includes data from 44,440 species across the plant and animal kingdoms, ranging from plants, plankton and terrestrial invertebrates to small and large vertebrates.Software format.csv and .SQL

eScholarship - University of California

The University of Arizona

The Tao of open science for ecology

Author: Anderson Sean S.
Bagby Sarah C.
Gries Corinna
Hampton Stephanie E.
Han Xueying
Hart Edmund M.
Jones Matthew B.
Lenhardt W. Christopher
MacDonald Andrew
Michener William K.
Mudge Joe
Pourmokhtarian Afshin
Schildhauer Mark P.
Woo Kara H.
Zimmerman Naupaka
Publication venue
Publication date: 01/01/2015
Field of study

The field of ecology is poised to take advantage of emerging technologies that facilitate the gathering, analyzing, and sharing of data, methods, and results. The concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers, constitutes “open science.” Despite the many benefits of an open approach to science, a number of barriers to entry exist that may prevent researchers from embracing openness in their own work. Here we describe several key shifts in mindset that underpin the transition to more open science. These shifts in mindset include thinking about data stewardship rather than data ownership, embracing transparency throughout the data life‐cycle and project duration, and accepting critique in public. Though foreign and perhaps frightening at first, these changes in thinking stand to benefit the field of ecology by fostering collegiality and broadening access to data and findings. We present an overview of tools and best practices that can enable these shifts in mindset at each stage of the research process, including tools to support data management planning and reproducible analyses, strategies for soliciting constructive feedback throughout the research process, and methods of broadening access to final research products

Carolina Digital Repository

A global database of lake surface temperatures collected by in situ and satellite methods from 1985–2009

Author: Adrian Rita
Allan Mathew G
Anneville Orlane
Arvola Lauri
Austin Jay
Bailey John
Baron Jill S
Gray Derek K
Gries Corinna
Hampton Stephanie E
Hook Simon
Lenters John D
Livingstone David M
McIntyre Peter B
O’Reilly Catherine M
Qudrat Anam
Read Jordan S
Samal Nihar R
Schneider Philipp
Sharma Sapna
Stefanoff Samantha
Publication venue: CUNY Academic Works
Publication date: 17/03/2015
Field of study

City University of New York

Recommended from our members

Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse

Author: Bissell Edward G.
Bremigan Mary Tate
Cheruvelil Kendra S.
Christel Samuel T.
Collins Sarah M.
Downing John A.
Fergus C. Emi
Filstrup Christopher T.
Gries Corinna
Henry Emily N.
Lapierre Jean-Francois
Lottig Noah R.
Oliver Samantha K.
Scott Caren E.
Skaff Nick K.
Smith Nicole J.
Soranno Patricia A.
Stanley Emily H.
Stopyak Scott
Stow Craig A.
Tan Pang-Ning
Wagner Tyler
Webster Katherine E.
Yuan Shuai
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km² ). LAGOS includes two modules: LAGOS[subscript]GEO , with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOS[subscript]LIMNO , with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.Keywords: LAGOS, Integrated database, Data harmonization, Database Ecoinformatics, Macrosystems ecology, Landscape limnology, Water qualityKeywords: LAGOS, Integrated database, Ecoinformatics, Data harmonization, Water quality, Data sharing, Landscape limnology, Macrosystems ecology, Database documentation, Data reus

ScholarsArchive@OSU

Bivariate Zero-Inflated Regression for Count Data: A Bayesian Approach with Application to Plant Counts

Author: Gries Corinna
Majumdar Anandamayee
Publication venue
Publication date
Field of study

Lately, bivariate zero-inflated (BZI) regression models have been used in many instances in the medical sciences to model excess zeros. Examples include the BZI Poisson (BZIP), BZI negative binomial (BZINB) models, etc. Such formulations vary in the basic modeling aspect and use the EM algorithm (Dempster, Laird and Rubin, 1977) for parameter estimation. A different modeling formulation in the Bayesian context is given by Dagne (2004). We extend the modeling to a more general setting for multivariate ZIP models for count data with excess zeros as proposed by Li, Lu, Park, Kim, Brinkley and Peterson (1999), focusing on a particular bivariate regression formulation. For the basic formulation in the case of bivariate data, we assume that Xi are (latent) independent Poisson random variables with parameters ? i, i = 0, 1, 2. A bi-variate count vector (Y1, Y2) response follows a mixture of four distributions; p0 stands for the mixing probability of a point mass distribution at (0, 0); p1, the mixing probability that Y2 = 0, while Y1 = X0 + X1; p2, the mixing probability that Y1 = 0 while Y2 = X0 + X2; and finally (1 - p0 - p1 - p2), the mixing probability that Yi = Xi + X0, i = 1, 2. The choice of the parameters {pi, ? i, i = 0, 1, 2} ensures that the marginal distributions of Yi are zero inflated Poisson (? 0 + ? i). All the parameters thus introduced are allowed to depend on co-variates through canonical link generalized linear models (McCullagh and Nelder, 1989). This flexibility allows for a range of real-life applications, especially in the medical and biological fields, where the counts are bivariate in nature (with strong association between the processes) and where there are excess of zeros in one or both processes. Our contribution in this paper is to employ a fully Bayesian approach consolidating the work of Dagne (2004) and Li et al. (1999) generalizing the modeling and sampling-based methods described by Ghosh, Mukhopadhyay and Lu (2006) to estimate the parameters and obtain posterior credible intervals both in the case where co-variates are not available as well as in the case where they are. In this context, we provide explicit data augmentation techniques that lend themselves to easier implementation of the Gibbs sampler by giving rise to well-known and closed-form posterior distributions in the bivariate ZIP case. We then use simulations to explore the effectiveness of this estimation using the Bayesian BZIP procedure, comparing the performance to the Bayesian and classical ZIP approaches. Finally, we demonstrate the methodology based on bivariate plant count data with excess zeros that was collected on plots in the Phoenix metropolitan area and compare the results with independent ZIP regression models fitted to both processes.

Research Papers in Economics