3,369 research outputs found

    Community next steps for making globally unique identifiers work for biocollections data

    Get PDF
    Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided

    Simple identification tools in FishBase

    Get PDF
    Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

    Toward a new data standard for combined marine biological and environmental datasets - expanding OBIS beyond species occurrences

    Get PDF
    The Ocean Biogeographic Information System (OBIS) is the world's most comprehensive online, open-access database of marine species distributions. OBIS grows with millions of new species observations every year. Contributions come from a network of hundreds of institutions, projects and individuals with common goals: to build a scientific knowledge base that is open to the public for scientific discovery and exploration and to detect trends and changes that inform society as essential elements in conservation management and sustainable development. Until now, OBIS has focused solely on the collection of biogeographic data (the presence of marine species in space and time) and operated with optimized data flows, quality control procedures and data standards specifically targeted to these data. Based on requirements from the growing OBIS community to manage datasets that combine biological, physical and chemical measurements, the OBIS-ENV-DATA pilot project was launched to develop a proposed standard and guidelines to make sure these combined datasets can stay together and are not, as is often the case, split and sent to different repositories. The proposal in this paper allows for the management of sampling methodology, animal tracking and telemetry data, biological measurements (e.g., body length, percent live cover, ...) as well as environmental measurements such as nutrient concentrations, sediment characteristics or other abiotic parameters measured during sampling to characterize the environment from which biogeographic data was collected. The recommended practice builds on the Darwin Core Archive (DwC-A) standard and on practices adopted by the Global Biodiversity Information Facility (GBIF). It consists of a DwC Event Core in combination with a DwC Occurrence Extension and a proposed enhancement to the DwC MeasurementOrFact Extension. This new structure enables the linkage of measurements or facts - quantitative and qualitative properties - to both sampling events and species occurrences, and includes additional fields for property standardization. We also embrace the use of the new parentEventID DwC term, which enables the creation of a sampling event hierarchy. We believe that the adoption of this recommended practice as a new data standard for managing and sharing biological and associated environmental datasets by IODE and the wider international scientific community would be key to improving the effectiveness of the knowledge base, and will enhance integration and management of critical data needed to understand ecological and biological processes in the ocean, and on land.Fil: De Pooter, Daphnis. Flanders Marine Institute; BélgicaFil: Appeltans, Ward. UNESCO-IOC; BélgicaFil: Bailly, Nicolas. Hellenic Centre for Marine Research, MedOBIS; GreciaFil: Bristol, Sky. United States Geological Survey; Estados UnidosFil: Deneudt, Klaas. Flanders Marine Institute; BélgicaFil: Eliezer, Menashè. Istituto Nazionale di Oceanografia e di Geofisica Sperimentale; ItaliaFil: Fujioka, Ei. University Of Duke. Nicholas School Of Environment. Duke Marine Lab; Estados UnidosFil: Giorgetti, Alessandra. Istituto Nazionale di Oceanografia e di Geofisica Sperimentale; ItaliaFil: Goldstein, Philip. University of Colorado Museum of Natural History, OBIS; Estados UnidosFil: Lewis, Mirtha Noemi. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Centro para el Estudio de Sistemas Marinos; ArgentinaFil: Lipizer, Marina. Istituto Nazionale di Oceanografia e di Geofisica Sperimentale; ItaliaFil: Mackay, Kevin. National Institute of Water and Atmospheric Research; Nueva ZelandaFil: Marin, Maria Rosa. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico; ArgentinaFil: Moncoiffé, Gwenaëlle. British Oceanographic Data Center; Reino UnidoFil: Nikolopoulou, Stamatina. Hellenic Centre for Marine Research, MedOBIS; GreciaFil: Provoost, Pieter. UNESCO-IOC; BélgicaFil: Rauch, Shannon. Woods Hole Oceanographic Institution; Estados UnidosFil: Roubicek, Andres. CSIRO Oceans and Atmosphere; AustraliaFil: Torres, Carlos. Universidad Autonoma de Baja California Sur; MéxicoFil: van de Putte, Anton. Royal Belgian Institute for Natural Sciences; BélgicaFil: Vandepitte, Leen. Flanders Marine Institute; BélgicaFil: Vanhoorne, Bart. Flanders Marine Institute; BélgicaFil: Vinci, Mateo. Istituto Nazionale di Oceanografia e di Geofisica Sperimentale; ItaliaFil: Wambiji, Nina. Kenya Marine and Fisheries Research Institute; KeniaFil: Watts, David. CSIRO Oceans and Atmosphere; AustraliaFil: Klein Salas, Eduardo. Universidad Simon Bolivar; VenezuelaFil: Hernandez, Francisco. Flanders Marine Institute; Bélgic

    ADVANCES IN KNOWLEDGE DISCOVERY IN DATABASES

    Get PDF
    The Knowledge Discovery in Databases and Data Mining field proposes the development of methods and techniques for assigning useful meanings for data stored in databases. It gathers researches from many study fields like machine learning, pattern recognition, databases, statistics, artificial intelligence, knowledge acquisition for expert systems, data visualization and grids. While Data Mining represents a set of specific algorithms of finding useful meanings in stored data, Knowledge Discovery in Databases represents the overall process of finding knowledge and includes the Data Mining as one step among others such as selection, pre�processing, transformation and interpretation of mined data. This paper aims to point the most important steps that were made in the Knowledge Discovery in Databases field of study and to show how the overall process of discovering can be improved in the future.

    The lifewatch approach to the exploration of distributed species information

    Get PDF
    © 2014 Daniel Fuentes, Nicola Fiore. This paper introduces a new method of automatically extracting, integrating and presenting information regarding species from the most relevant online taxonomic resources. First, the information is extracted and joined using data wrappers and integration solutions. Ten, an analytical tool is used to provide a visual representation of the data. Te information is then integrated into a user friendly content management system. Te proposal has been implemented using data from the Global Biodiversity Information Facility (GBIF), the Catalogue of Life (CoL), the World Register of Marine Species (WoRMS), the Integrated Taxonomic Information System (ITIS) and the Global Names Index (GNI). Te approach improves data quality, avoiding taxonomic and nomenclature errors whilst increasing the availability and accessibility of the information.Peer Reviewe

    A Data-driven, High-performance and Intelligent CyberInfrastructure to Advance Spatial Sciences

    Get PDF
    abstract: In the field of Geographic Information Science (GIScience), we have witnessed the unprecedented data deluge brought about by the rapid advancement of high-resolution data observing technologies. For example, with the advancement of Earth Observation (EO) technologies, a massive amount of EO data including remote sensing data and other sensor observation data about earthquake, climate, ocean, hydrology, volcano, glacier, etc., are being collected on a daily basis by a wide range of organizations. In addition to the observation data, human-generated data including microblogs, photos, consumption records, evaluations, unstructured webpages and other Volunteered Geographical Information (VGI) are incessantly generated and shared on the Internet. Meanwhile, the emerging cyberinfrastructure rapidly increases our capacity for handling such massive data with regard to data collection and management, data integration and interoperability, data transmission and visualization, high-performance computing, etc. Cyberinfrastructure (CI) consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high-performance networks to improve research productivity and enable breakthroughs that are not otherwise possible. The Geospatial CI (GCI, or CyberGIS), as the synthesis of CI and GIScience has inherent advantages in enabling computationally intensive spatial analysis and modeling (SAM) and collaborative geospatial problem solving and decision making. This dissertation is dedicated to addressing several critical issues and improving the performance of existing methodologies and systems in the field of CyberGIS. My dissertation will include three parts: The first part is focused on developing methodologies to help public researchers find appropriate open geo-spatial datasets from millions of records provided by thousands of organizations scattered around the world efficiently and effectively. Machine learning and semantic search methods will be utilized in this research. The second part develops an interoperable and replicable geoprocessing service by synthesizing the high-performance computing (HPC) environment, the core spatial statistic/analysis algorithms from the widely adopted open source python package – Python Spatial Analysis Library (PySAL), and rich datasets acquired from the first research. The third part is dedicated to studying optimization strategies for feature data transmission and visualization. This study is intended for solving the performance issue in large feature data transmission through the Internet and visualization on the client (browser) side. Taken together, the three parts constitute an endeavor towards the methodological improvement and implementation practice of the data-driven, high-performance and intelligent CI to advance spatial sciences.Dissertation/ThesisDoctoral Dissertation Geography 201

    Ubiquitous Semantic Applications

    Get PDF
    As Semantic Web technology evolves many open areas emerge, which attract more research focus. In addition to quickly expanding Linked Open Data (LOD) cloud, various embeddable metadata formats (e.g. RDFa, microdata) are becoming more common. Corporations are already using existing Web of Data to create new technologies that were not possible before. Watson by IBM an artificial intelligence computer system capable of answering questions posed in natural language can be a great example. On the other hand, ubiquitous devices that have a large number of sensors and integrated devices are becoming increasingly powerful and fully featured computing platforms in our pockets and homes. For many people smartphones and tablet computers have already replaced traditional computers as their window to the Internet and to the Web. Hence, the management and presentation of information that is useful to a user is a main requirement for today’s smartphones. And it is becoming extremely important to provide access to the emerging Web of Data from the ubiquitous devices. In this thesis we investigate how ubiquitous devices can interact with the Semantic Web. We discovered that there are five different approaches for bringing the Semantic Web to ubiquitous devices. We have outlined and discussed in detail existing challenges in implementing this approaches in section 1.2. We have described a conceptual framework for ubiquitous semantic applications in chapter 4. We distinguish three client approaches for accessing semantic data using ubiquitous devices depending on how much of the semantic data processing is performed on the device itself (thin, hybrid and fat clients). These are discussed in chapter 5 along with the solution to every related challenge. Two provider approaches (fat and hybrid) can be distinguished for exposing data from ubiquitous devices on the Semantic Web. These are discussed in chapter 6 along with the solution to every related challenge. We conclude our work with a discussion on each of the contributions of the thesis and propose future work for each of the discussed approach in chapter 7
    • …
    corecore