4 research outputs found

    Data Quality Concepts and Techniques Applied to Taxonomic Databases

    No full text
    <p>The thesis investigates the application of concepts and techniques of data quality in taxonomic databases to enhance the quality of information services and systems in taxonomy. Taxonomic data are arranged and introduced in Taxonomic Data Domains in order to establish a standard and a working framework to support the proposed Taxonomic Data Quality Dimensions, as a specialised application of conventional Data Quality Dimensions in the Taxonomic Data Quality Domains.</p> <p>The thesis presents a discussion about improving data quality in taxonomic databases, considering conventional Data Cleansing techniques and applying generic data content error patterns to taxonomic data. Techniques of taxonomic error detection are explored, with special attention to scientific name spelling errors.</p> <p>The spelling error problem is scrutinized through spelling error detecting techniques and algorithms. Spelling error detection algorithms are described and analysed. In order to evaluate the applicability and efficiency of different spelling error detection algorithms, a suite of experimental spelling error detection tools was developed and a set of experiments was performed, using a sample of five different taxonomic databases. The results of the experiments are analysed from the algorithm and from the database point of view.</p> <p>Database quality assessment procedures and metrics are discussed in the context of taxonomic databases and the previously introduced concepts of Taxonomic Data Domains and Taxonomic Data Quality Dimensions.</p> <p>Four questions related to Taxonomic Database Quality are discussed, followed by conclusions and recommendations involving information system design and implementation and the processes involved in taxonomic data management and information flow.</p

    Abordagem Colaborativa para a Melhoria da Qualidade de Dados em Bases de Dados Botânicas

    No full text
    <p>The purpose of this study is to consider how research in collaborative technology can assist at development of management systems for botanical collections, at sharing the knowledge of researchers and promoting the spread of identifications and taxonomic revisions. For this approach a study was conducted in order to verify the quality of data and the main types of errors found in this type of biodiversity database application. A model for the development of an architecture for the system is proposed. </p

    Gestão do Conhecimento Taxonômico Aplicado na Conservação da Flora Brasileira

    No full text
    <p>Information systems for biodiversity management and monitoring<br>are effective tools for nature conservation. However, the taxonomic data are<br>difficult to handle, considering the rate of new discoveries and constant<br>updates of the names by the specialists. Thus, the definition of a methodology to manage and share knowledge of these researchers is needed. In this study, we present an approach to the use of  collaboration technologies, ontologies and data mining, enabling the management of scientific knowledge in plant conservation.</p> <p> </p

    Spatial data quality of herbarium datasets and implications for decision-making on biodiversity conservation in Brazil

    No full text
    <p>The present level of biodiversity depletion and loss, and the diffusion of new geotechnologies create the outlines for a new paradigm, where spatial data is extremely valuable. Quality datasets may be used to support decision-making processes in public policies related to biodiversity conservation. Specimen datasets used by the scientific community for their analysis starts to become available to the public and private sectors. During the last two decades several important datasets like Species 2000, The Plant List, JABOT among others, and access platforms like GBIF or SpeciesLink have become available. On the other hand, widespread access to technologies has made geoprocessing and spatial analyses easier. However poor data quality is still critical and limits the usefulness of these datasets. Thereby, data quality assessments are important to ensure a responsible use of those datasets. The Brazilian National Centre for Flora Conservation was created in 2008, at the Rio de Janeiro Botanical Garden, with the main objective of assessing the extinction risk of plant species and to plan conservation actions. In this context, a dataset was created after the compilation of occurrence records (248,837) of 4,711 threatened species, obtained from 70 herbaria. The present study aims to assess quality of the dataset and records, and to test quality improvement after data cleaning efforts. We have used the five-component scheme for assessing dataset quality. Significance of the differences between expected and observed proportions were tested using the degree of confidence between proportions by the software R. The Mann-Whitney test was used to compare errors between the original dataset and the cleaned out one. Results indicate poor quality, not only for dataset (p<0.10) but also for records (p<0,10). Only 54,306 records (22.30%) were considered of good quality. Logical inconsistencies in the dataset were present in 8,237 records (3.37%). Historical collections of Brazilian herbaria are composed by different datasets, which were incorporated gradually over time without proper metadata. And for being able to use herbarium datasets for supporting the decision-making processes on biodiversity it is important to keep all metadata and appropriate documents that proves veracity of data.</p
    corecore