197 research outputs found
Recommended from our members
Types, Granularities and Combinations of Geographic Objects in the Haiti Crisis Map
Recommended from our members
Visual Analytical Approaches to Evaluate Uncertainty and Bias in Crowdsourced Crisis Information
Increasing numbers of people are using social media to exchange information during crisis and conflict events. On the one hand, the humanitarian community is reluctant to use this information in the response effort as it fears the cost of untrustworthy and inaccurate information. On the other, the volunteer and technical communities have attempted to resolve this impasse by crowdsourcing crisis information; for example, by asking volunteers to ascertain whether a crisis report is trustworthy and accurate. Trust and accuracy are two characteristics of uncertainty: The fact that each is likely to have spatial, temporal and thematic aspects is supported by research, which suggests that geography characterises crisis information. Consequently, a research programme grounded in geographic information science, (geo)visualization and (geo)visual analytics is presented that seeks to evaluate the degree to which uncertainty and bias (systematic variation) are found in crowdsourced crisis information; and seeks to provide heuristics to help manage these factors. This programme consists of a methodology for undertaking interactive, analysis-guided software development that is informed by action research, scenario-based design and Munzner's model of visualization validation; and a prototype software application that combines interactive visual representations with spatial statistical functions to explore two datasets of crowdsourced crisis information. Following a review of the literature and a description of the data, the methodology and its implementation are placed within an appropriate work plan. Three supporting publications are included, as well as supporting statements regarding the author's skills and engagement with the academic community
Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
Abstract Background Increasing the quantity and quality of data is a key goal of biodiversity informatics, leading to increased fitness for use in scientific research and beyond. This goal is impeded by a legacy of geographic locality descriptions associated with biodiversity records that are often heterogeneous and not in a map-ready format. The biodiversity informatics community has developed best practices and tools that provide the means to do retrospective georeferencing (e.g., the BioGeomancer toolkit), a process that converts heterogeneous descriptions into geographic coordinates and a measurement of spatial uncertainty. Even with these methods and tools, data publishers are faced with the immensely time-consuming task of vetting georeferenced localities. Furthermore, it is likely that overlap in georeferencing effort is occurring across data publishers. Solutions are needed that help publishers more effectively georeference their records, verify their quality, and eliminate the duplication of effort across publishers. Results We have developed a tool called BioGeoBIF, which incorporates the high throughput and standardized georeferencing methods of BioGeomancer into a beginning-to-end workflow. Custodians who publish their data to the Global Biodiversity Information Facility (GBIF) can use this system to improve the quantity and quality of their georeferences. BioGeoBIF harvests records directly from the publishers' access points, georeferences the records using the BioGeomancer web-service, and makes results available to data managers for inclusion at the source. Using a web-based, password-protected, group management system for each data publisher, we leave data ownership, management, and vetting responsibilities with the managers and collaborators of each data set. We also minimize the georeferencing task, by combining and storing unique textual localities from all registered data access points, and dynamically linking that information to the password protected record information for each publisher. Conclusion We have developed one of the first examples of services that can help create higher quality data for publishers mediated through the Global Biodiversity Information Facility and its data portal. This service is one step towards solving many problems of data quality in the growing field of biodiversity informatics. We envision future improvements to our service that include faster results returns and inclusion of more georeferencing engines
LOCALITY UNCERTAINTY AND THE DIFFERENTIAL PERFORMANCE OF FOUR COMMON NICHE-BASED MODELING TECHNIQUES
We address a poorly understood aspect of ecological niche modeling: its sensitivity to different levels of geographic uncertainty in organism occurrence data. Our primary interest was to assess how accuracy degrades under increasing uncertainty, with performance measured indirectly through model consistency. We used Monte Carlo simulations and a similarity measure to assess model sensitivity across three variables: locality accuracy, niche modeling method, and species. Randomly generated data sets with known levels of locality uncertainty were compared to an original prediction using Fuzzy Kappa. Data sets where locality uncertainty is low were expected to produce similar distribution maps to the original. In contrast, data sets where locality uncertainty is high were expected to produce less similar maps. BIOCLIM, DOMAIN, Maxent and GARP were used to predict the distributions for 1200 simulated datasets (3 species x 4 buffer sizes x 100 randomized data sets). Thus, our experimental design produced a total of 4800 similarity measures, with each of the simulated distributions compared to the prediction of the original data set and corresponding modeling method. A general linear model (GLM) analysis was performed which enables us to simultaneously measure the effect of buffer size, modeling method, and species, as well as interactions among all variables. Our results show that modeling method has the largest effect on similarity scores and uniquely accounts for 40% of the total variance in the model. The second most important factor was buffer size, but it uniquely accounts for only 3% of the variation in the model. The newer and currently more popular methods, GARP and Maxent, were shown to produce more inconsistent predictions than the earlier and simpler methods, BIOCLIM and DOMAIN. Understanding the performance of different niche modeling methods under varying levels of geographic uncertainty is an important step toward more productive applications of historical biodiversity collections
Recommended from our members
Building data into knowledge: Identifying challenges and their solutions in biodiversity informatics
Biologists are in a race to document biodiversity in the face of ailing ecosystems and species decline. The drive to create knowledge to support effective documentation, measurement, and conservation of biodiversity has led the community to quickly research and develop methods to organize and connect biodiversity data across providers and throughout the world. Biodiversity data came online through distributed and disconnected databases but through time has been shaped into a biodiversity network that now represents nearly 500 million biodiversity records. The ability to access these data has brought exciting new research and new challenges. In this thesis I discuss my work to solve some of those challenges and build innovative approaches and tools for biodiversity informatics. I start by documenting tools that help improve the quality and fitness for use of data. Then I present two tools for visualizing and analyzing data in a phylogenetic and conservation context. More importantly, I discuss how designing these tools to operate within a greater knowledge creation framework can make the work of documenting patterns and processes in biodiversity faster and more resilient to future changes and improved information. At the heart of that discussion is the idea that the outputs of the tools themselves should be published and directly linked back to the original data and forward to any future analyses. The outputs should also document all models, parameters, and heuristics used do arrive at the reported outcome. In this way, both the data and our research of that data can be woven into a connected fabric of knowledge and information that links biodiversity and the digital data stored in our databases. Finally, I discuss the possibility we have for expanding our biodiversity data and improving the research we can do with it through the use of citizen science. The data available today is still deficient. Natural history collections hold a wealth of data that has not yet been digitized, but as a community we lack the resources to unlock that data quickly without a novel solution. Citizen science offers us the ability to quickly generate historical biodiversity data from natural history collections. We present a novel platform for engaging citizen scientists and developing a shared, community driven, platform to harness the potential of citizen science
LOCALITY UNCERTAINTY AND THE DIFFERENTIAL PERFORMANCE OF FOUR COMMON NICHE-BASED MODELING TECHNIQUES
Developing Global Maps of the Dominant Anopheles Vectors of Human Malaria
Simon Hay and colleagues describe how the Malaria Atlas Project has collated anopheline occurrence data to map the geographic distributions of the dominant mosquito vectors of human malaria
Automated Georeferencing of Antarctic Species
Many text documents in the biological domain contain references to the toponym of specific phenomena (e.g. species sightings) in natural language form "In Garwood Valley summer activity was 0.2% for Umbilicaria aprina and 1.7% for Caloplaca sp. ..."
While methods have been developed to extract place names from documents, and attention has been given to the interpretation of spatial prepositions, the ability to connect toponym mentions in text with the phenomena to which they refer (in this case species) has been given limited attention, but would be of considerable benefit for the task of mapping specific phenomena mentioned in text documents.
As part of work to create a pipeline to automate georeferencing of species within legacy documents, this paper proposes a method to: (1) recognise species and toponyms within text and (2) match each species mention to the relevant toponym mention. Our methods find significant promise in a bespoke rules- and dictionary-based approach to recognise species within text (F1 scores up to 0.87 including partial matches) but less success, as yet, recognising toponyms using multiple gazetteers combined with an off the shelf natural language processing tool (F1 up to 0.62).
Most importantly, we offer a contribution to the relatively nascent area of matching toponym references to the object they locate (in our case species), including cases in which the toponym and species are in different sentences. We use tree-based models to achieve precision as high as 0.88 or an F1 score up to 0.68 depending on the downsampling rate. Initial results out perform previous research on detecting entity relationships that may cross sentence boundaries within biomedical text, and differ from previous work in specifically addressing species mapping
Uncertainty matters: ascertaining where specimens in natural history collections come from and its implications for predicting species distributions
Natural history collections (NHCs) represent an enormous and largely untapped wealth of information on the Earth's biota, made available through GBIF as digital preserved specimen records. Precise knowledge of where the specimens were collected is paramount to rigorous ecological studies, especially in the field of species distribution modelling. Here, we present a first comprehensive analysis of georeferencing quality for all preserved specimen records served by GBIF, and illustrate the impact that coordinate uncertainty may have on predicted potential distributions. We used all GBIF preserved specimen records to analyse the availability of coordinates and associated spatial uncertainty across geography, spatial resolution, taxonomy, publishing institutions and collection time. We used three plant species across their native ranges in different parts of the world to show the impact of uncertainty on predicted potential distributions. We found that 38% of the 180+ million records provide coordinates only and 18% coordinates and uncertainty. Georeferencing quality is determined more by country of collection and publishing than by taxonomic group. Distinct georeferencing practices are more determinant than implicit characteristics and georeferencing difficulty of specimens. Availability and quality of records contrasts across world regions. Uncertainty values are not normally distributed but peak at very distinct values, which can be traced back to specific regions of the world. Uncertainty leads to a wide spectrum of range sizes when modelling species distributions, potentially affecting conclusions in biogeographical and climate change studies. In summary, the digitised fraction of the world's NHCs are far from optimal in terms of georeferencing and quality mainly depends on where the collections are hosted. A collective effort between communities around NHC institutions, ecological research and data infrastructure is needed to bring the data on a par with its importance and relevance for ecological research
- …