3,890 research outputs found
Accurator: Nichesourcing for Cultural Heritage
With more and more cultural heritage data being published online, their
usefulness in this open context depends on the quality and diversity of
descriptive metadata for collection objects. In many cases, existing metadata
is not adequate for a variety of retrieval and research tasks and more specific
annotations are necessary. However, eliciting such annotations is a challenge
since it often requires domain-specific knowledge. Where crowdsourcing can be
successfully used for eliciting simple annotations, identifying people with the
required expertise might prove troublesome for tasks requiring more complex or
domain-specific knowledge. Nichesourcing addresses this problem, by tapping
into the expert knowledge available in niche communities. This paper presents
Accurator, a methodology for conducting nichesourcing campaigns for cultural
heritage institutions, by addressing communities, organizing events and
tailoring a web-based annotation tool to a domain of choice. The contribution
of this paper is threefold: 1) a nichesourcing methodology, 2) an annotation
tool for experts and 3) validation of the methodology and tool in three case
studies. The three domains of the case studies are birds on art, bible prints
and fashion images. We compare the quality and quantity of obtained annotations
in the three case studies, showing that the nichesourcing methodology in
combination with the image annotation tool can be used to collect high quality
annotations in a variety of domains and annotation tasks. A user evaluation
indicates the tool is suited and usable for domain specific annotation tasks
Extending the 5S Framework of Digital Libraries to support Complex Objects, Superimposed Information, and Content-Based Image Retrieval Services
Advanced services in digital libraries (DLs) have been developed and widely used to address the required capabilities of an assortment of systems as DLs expand into diverse application domains. These systems may require support for images (e.g., Content-Based Image Retrieval), Complex (information) Objects, and use of content at fine grain (e.g., Superimposed Information). Due to the lack of consensus on precise theoretical definitions for those services, implementation efforts often involve ad hoc development, leading to duplication and interoperability problems. This article presents a methodology to address those problems by extending a precisely specified minimal digital library (in the 5S framework) with formal definitions of aforementioned services. The theoretical extensions of digital library functionality presented here are reinforced with practical case studies as well as scenarios for the individual and integrative use of services to balance theory and practice. This methodology has implications that other advanced
services can be continuously integrated into our current extended framework whenever they are identified. The theoretical definitions and case study we present may impact future development efforts and a wide range of digital library researchers, designers, and developers
Machine learning using digitized herbarium specimens to advance phenological research
Machine learning (ML) has great potential to drive scientific discovery by harvesting data from images of herbarium specimens—preserved plant material curated in natural history collections—but ML techniques have only recently been applied to this rich resource. ML has particularly strong prospects for the study of plant phenological events such as growth and reproduction. As a major indicator of climate change, driver of ecological processes, and critical determinant of plant fitness, plant phenology is an important frontier for the application of ML techniques for science and society. In the present article, we describe a generalized, modular ML workflow for extracting phenological data from images of herbarium specimens, and we discuss the advantages, limitations, and potential future improvements of this workflow. Strategic research and investment in specimen-based ML methods, along with the aggregation of herbarium specimen data, may give rise to a better understanding of life on Earth
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
The UNITE database for molecular identification of fungi : handling dark taxa and parallel taxonomic classifications
Alfred P. Sloan Foundation [G-2015-14062]; Swedish Research Council of Environment, Agricultural Sciences, and Spatial Planning [FORMAS, 215-2011-498]; European Regional Development Fund (Centre of Excellence EcolChange) [TK131]; Estonian Research Council [IUT20-30]. Funding for open access charge: Swedish Research Council of Environment, Agricultural Sciences and Spatial Planning.Peer reviewedPublisher PD
A Trillion Coral Reef Colors: Deeply Annotated Underwater Hyperspectral Images for Automated Classification and Habitat Mapping
This paper describes a large dataset of underwater hyperspectral imagery that can be used by researchers in the domains of computer vision, machine learning, remote sensing, and coral reef ecology. We present the details of underwater data acquisition, processing and curation to create this large dataset of coral reef imagery annotated for habitat mapping. A diver-operated hyperspectral imaging system (HyperDiver) was used to survey 147 transects at 8 coral reef sites around the Caribbean island of Curacao. The underwater proximal sensing approach produced fine-scale images of the seafloor, with more than 2.2 billion points of detailed optical spectra. Of these, more than 10 million data points have been annotated for habitat descriptors or taxonomic identity with a total of 47 class labels up to genus- and species-levels. In addition to HyperDiver survey data, we also include images and annotations from traditional (color photo) quadrat surveys conducted along 23 of the 147 transects, which enables comparative reef description between two types of reef survey methods. This dataset promises benefits for efforts in classification algorithms, hyperspectral image segmentation and automated habitat mapping. Dataset: https://doi.org/10.1594/PANGAEA.911300 Dataset License: CC-BY-N
A framework to support the annotation, discovery and evaluation of data in ecology, for a better visibility and reuse of data and an increased societal value gained from environmental projects
Die vorliegende Dissertationsschrift beschäftigt sich im Kern mit der Verwendung von
Metadaten in alltäglichen, datenbezogenen Arbeitsabläufen von Ökologen. Die vorgelegte
Arbeit befasst sich dabei mit der Erstellung eines Rahmenwerkes zur UnterstĂĽtzung der
Annotation ökologischer Daten, der effizienten Suche nach ökologischen Daten in
Datenbanken und der Einbindung von Metadaten während der Datenanalyse. Weiterhin
behandelt die Arbeit die Dokumentation von Analysen sowie die Auswertung von
Metadaten zur Entwicklung von Werkzeugen fĂĽr eine Aufbereitung von Informationen
über ökologische Projekte. Diese Informationen können zur Evaluation und Maximierung
des aus den Projekten gezogenen gesellschaftlichen Mehrwerts eingesetzt werden. Die
vorliegende Arbeit ist als kumulative Dissertation in englischer Sprache abgefasst. Sie
basiert auf zwei Veröffentlichungen als Erstautor und einem zur Einreichung vorbereiteten Manuskript
- …