3,890 research outputs found

    Accurator: Nichesourcing for Cultural Heritage

    Full text link
    With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting such annotations is a challenge since it often requires domain-specific knowledge. Where crowdsourcing can be successfully used for eliciting simple annotations, identifying people with the required expertise might prove troublesome for tasks requiring more complex or domain-specific knowledge. Nichesourcing addresses this problem, by tapping into the expert knowledge available in niche communities. This paper presents Accurator, a methodology for conducting nichesourcing campaigns for cultural heritage institutions, by addressing communities, organizing events and tailoring a web-based annotation tool to a domain of choice. The contribution of this paper is threefold: 1) a nichesourcing methodology, 2) an annotation tool for experts and 3) validation of the methodology and tool in three case studies. The three domains of the case studies are birds on art, bible prints and fashion images. We compare the quality and quantity of obtained annotations in the three case studies, showing that the nichesourcing methodology in combination with the image annotation tool can be used to collect high quality annotations in a variety of domains and annotation tasks. A user evaluation indicates the tool is suited and usable for domain specific annotation tasks

    Extending the 5S Framework of Digital Libraries to support Complex Objects, Superimposed Information, and Content-Based Image Retrieval Services

    Get PDF
    Advanced services in digital libraries (DLs) have been developed and widely used to address the required capabilities of an assortment of systems as DLs expand into diverse application domains. These systems may require support for images (e.g., Content-Based Image Retrieval), Complex (information) Objects, and use of content at fine grain (e.g., Superimposed Information). Due to the lack of consensus on precise theoretical definitions for those services, implementation efforts often involve ad hoc development, leading to duplication and interoperability problems. This article presents a methodology to address those problems by extending a precisely specified minimal digital library (in the 5S framework) with formal definitions of aforementioned services. The theoretical extensions of digital library functionality presented here are reinforced with practical case studies as well as scenarios for the individual and integrative use of services to balance theory and practice. This methodology has implications that other advanced services can be continuously integrated into our current extended framework whenever they are identified. The theoretical definitions and case study we present may impact future development efforts and a wide range of digital library researchers, designers, and developers

    Machine learning using digitized herbarium specimens to advance phenological research

    Get PDF
    Machine learning (ML) has great potential to drive scientific discovery by harvesting data from images of herbarium specimens—preserved plant material curated in natural history collections—but ML techniques have only recently been applied to this rich resource. ML has particularly strong prospects for the study of plant phenological events such as growth and reproduction. As a major indicator of climate change, driver of ecological processes, and critical determinant of plant fitness, plant phenology is an important frontier for the application of ML techniques for science and society. In the present article, we describe a generalized, modular ML workflow for extracting phenological data from images of herbarium specimens, and we discuss the advantages, limitations, and potential future improvements of this workflow. Strategic research and investment in specimen-based ML methods, along with the aggregation of herbarium specimen data, may give rise to a better understanding of life on Earth

    Knowledge-rich Image Gist Understanding Beyond Literal Meaning

    Full text link
    We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process

    The UNITE database for molecular identification of fungi : handling dark taxa and parallel taxonomic classifications

    Get PDF
    Alfred P. Sloan Foundation [G-2015-14062]; Swedish Research Council of Environment, Agricultural Sciences, and Spatial Planning [FORMAS, 215-2011-498]; European Regional Development Fund (Centre of Excellence EcolChange) [TK131]; Estonian Research Council [IUT20-30]. Funding for open access charge: Swedish Research Council of Environment, Agricultural Sciences and Spatial Planning.Peer reviewedPublisher PD

    A Trillion Coral Reef Colors: Deeply Annotated Underwater Hyperspectral Images for Automated Classification and Habitat Mapping

    No full text
    This paper describes a large dataset of underwater hyperspectral imagery that can be used by researchers in the domains of computer vision, machine learning, remote sensing, and coral reef ecology. We present the details of underwater data acquisition, processing and curation to create this large dataset of coral reef imagery annotated for habitat mapping. A diver-operated hyperspectral imaging system (HyperDiver) was used to survey 147 transects at 8 coral reef sites around the Caribbean island of Curacao. The underwater proximal sensing approach produced fine-scale images of the seafloor, with more than 2.2 billion points of detailed optical spectra. Of these, more than 10 million data points have been annotated for habitat descriptors or taxonomic identity with a total of 47 class labels up to genus- and species-levels. In addition to HyperDiver survey data, we also include images and annotations from traditional (color photo) quadrat surveys conducted along 23 of the 147 transects, which enables comparative reef description between two types of reef survey methods. This dataset promises benefits for efforts in classification algorithms, hyperspectral image segmentation and automated habitat mapping. Dataset: https://doi.org/10.1594/PANGAEA.911300 Dataset License: CC-BY-N

    A framework to support the annotation, discovery and evaluation of data in ecology, for a better visibility and reuse of data and an increased societal value gained from environmental projects

    Get PDF
    Die vorliegende Dissertationsschrift beschäftigt sich im Kern mit der Verwendung von Metadaten in alltäglichen, datenbezogenen Arbeitsabläufen von Ökologen. Die vorgelegte Arbeit befasst sich dabei mit der Erstellung eines Rahmenwerkes zur Unterstützung der Annotation ökologischer Daten, der effizienten Suche nach ökologischen Daten in Datenbanken und der Einbindung von Metadaten während der Datenanalyse. Weiterhin behandelt die Arbeit die Dokumentation von Analysen sowie die Auswertung von Metadaten zur Entwicklung von Werkzeugen für eine Aufbereitung von Informationen über ökologische Projekte. Diese Informationen können zur Evaluation und Maximierung des aus den Projekten gezogenen gesellschaftlichen Mehrwerts eingesetzt werden. Die vorliegende Arbeit ist als kumulative Dissertation in englischer Sprache abgefasst. Sie basiert auf zwei Veröffentlichungen als Erstautor und einem zur Einreichung vorbereiteten Manuskript
    • …
    corecore