713,683 research outputs found

    Linked Data Quality Assessment and its Application to Societal Progress Measurement

    Get PDF
    In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented. With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself. A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets. Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology. Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis

    LinkWiper – A System For Data Quality in Linked Open Data

    Full text link
    Linked Open Data (LOD) provides access to large amounts of data on Web. These data sets range from high quality curated data sets to low quality sets. LOD sources often need strategies to clean up data and provide methodology for quality assessment in linked data. They allow interlinking and integrating any kind of data on the web. Links between various data sources enable software applications to operate over the aggregated data space as if it is a unique local database. However, such links may be broken, leading to data quality problems. In this thesis we present LinkWiper, an automated system for cleaning data in LOD. While this thesis focuses on problems related to dereferenced links, LinkWiper can be used to tackle any other data quality problem such as duplication and consistency. The proposed system includes two major phases. The first phase uses information retrieval-like search techniques to recommend sets of alternative links. The second phase adopts crowdsourcing mechanisms to involve workers (or users) in improving the quality of the LOD sources. We provide an implementation of LinkWiper over DBPedia, a community effort to extract structured information from Wikipedia and make this information using LOD principles. We also conduct extensive experiments to illustrate the efficiency and high precision of the proposed approach.Master of ScienceComputer and Information Science, College of Engineering and Computer ScienceUniversity of Michigan-Dearbornhttp://deepblue.lib.umich.edu/bitstream/2027.42/136065/1/LinkWiper – A System For Data Quality in Linked Open Data.pdfDescription of LinkWiper – A System For Data Quality in Linked Open Data.pdf : Master of Science Thesi

    A Risk Assessment Based Model for Assessing the Environmental Sustainability of Tourism and Recreation Areas

    Get PDF
    Assessing the environmental quality of tourism and recreation areas is considered fundamental to the sustainable management of these resources. However, existing methodologies for such assessments rely on sets of environmental data that are often poorly linked and difficult to interpret and integrate in a holistic manner. Risk assessment is a concept that has developed to the point where it has the potential to address current limitations in environmental assessment methodologies. This thesis presents a new model for the application of risk assessment to the management and assessment of environmental sustainability in the tourism and recreation sector. This model was applied and tested at two contrasting tourism and recreation areas in Ireland and a detailed methodology was developed. The results of this research identify key problem areas with respect to environmental sustainability at the two study areas. These results also demonstrate the strengths of the risk assessment approach and indicate that this methodology represents a valuable alternative to existing methodologies

    Quality assessment of linked Canadian clinical administrative hospital and vital statistics death data

    Get PDF
    Introduction Three Canadian clinical-administrative hospital databases were linked to the Canadian Vital Statistics Death Database (CVSD) to provide information about patients who died following discharge from hospital as well as supplementary information about patients that died in-hospital. Quality was assessed using a guided approach and through feedback from initial users. Objectives and Approach The linked datasets were created to develop and validate health care indicators and performance measures and perform outcome analyses. It is therefore imperative to evaluate the data’s fitness for use. Quality was assessed by calculating coverage of deaths for all linked contributors, creating a profile of the linked dataset and analyzing issues that were identified by users. These analyses were guided by an existing Data Source Assessment Tool, which provides a set of criteria that allow for assessment across five dimensions of quality, thus allowing for appropriate determination of a given set of data’s fitness for use. Results Deterministic linkage of the datasets resulted in linkage rates that ranged from 66.9% to 90.9% depending on the dataset or data year. Linkage rates also varied by Canadian jurisdictions and patient cohort. Variables had good data availability with rates of 95% or higher. Initial users identified a significant number of duplicate records that were flagged to and corrected by the data supplier. 1.4\% of acute hospital deaths had discrepancies in the death date captured in the two linked sources; the vast majority had a difference of only one day. A user group and issue tracking process were created to share information about the linked data and guarantee that issues are triaged to the appropriate party and allow for timely follow up with the data supplier. Conclusion/Implications Documentation provided by the data supplier was vital to understanding the linkage methodology and its impact on linkage rates. A guided data assessment ensured that strengths and limitations were identified and shared to support appropriate use. Feedback to the data supplier is supporting ongoing improvements to the linkage methodology

    Key issues in rigorous accuracy assessment of land cover products

    Get PDF
    © 2019 Accuracy assessment and land cover mapping have been inexorably linked throughout the first 50 years of publication of Remote Sensing of Environment. The earliest developers of land-cover maps recognized the importance of evaluating the quality of their maps, and the methods and reporting format of these early accuracy assessments included features that would be familiar to practitioners today. Specifically, practitioners have consistently recognized the importance of obtaining high quality reference data to which the map is compared, the need for sampling to collect these reference data, and the role of an error matrix and accuracy measures derived from the error matrix to summarize the accuracy information. Over the past half century these techniques have undergone refinements to place accuracy assessment on a more scientifically credible footing. We describe the current status of accuracy assessment that has emerged from nearly 50 years of practice and identify opportunities for future advances. The article is organized by the three major components of accuracy assessment, the sampling design, response design, and analysis, focusing on good practice methodology that contributes to a rigorous, informative, and honest assessment. The long history of research and applications underlying the current practice of accuracy assessment has advanced the field to a mature state. However, documentation of accuracy assessment methods needs to be improved to enhance reproducibility and transparency, and improved methods are required to address new challenges created by advanced technology that has expanded the capacity to map land cover extensively in space and intensively in time

    A practical application of statistical process control to evaluate the performance rate of academic programmes: implications and suggestions

    Get PDF
    Purpose – This study aims to properly and objectively assess the students’ study progress in bachelor programmes by applying statistical process control (SPC). Specifically, the authors focused their analysis on the variation in performance rates in business studies courses taught at a Spanish University. Design/methodology/approach – A qualitative methodology was used, using an action-based case study developed in a public university. Previous research and theoretical issues related to quality indicators of the training programmes were discussed, followed by the application of SPC to assess these outputs. Findings – The evaluation of the performance rate of the courses that comprised the training programs through the SPC revealed significant differences with respect to the evaluations obtained through traditional evaluation procedures. Similarly, the results show differences in the control parameters (central line and control interval), depending on the adopted approach (by programmes, by academic year and by department). Research limitations/implications – This study has inherent limitations linked to both the methodology and selection of data sources. Practical implications – The SPC approach provides a framework to properly and objectively assess the quality indicators involved in quality assurance processes in higher education. Originality/value – This paper contributes to the discourse on the importance of a robust and effective assessment of quality indicators of the academic curriculum in the higher education context through the application of quality control tools such as SPC.Funding for open access charge: Universidad de Huelva / CBU

    Assessing and refining mappings to RDF to improve dataset quality

    Get PDF
    RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually -but rarely- applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from (semi-) structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as DBpedia, or newly generated, such as iLastic. Our evaluation indicates the efficiency of our workflow, as it significantly improves the overall quality of an RDF dataset in the observed cases

    Decision support system for managing stormwater and greywater quality in informal settlements in South Africa

    Get PDF
    Managing the quality of stormwater and greywater in informal settlements are essential to their growth. In this thesis, methodologies are developed for the assessment and management of stormwater and greywater quality based on the analysis of both nonstructural and structural control interventions. The objectives of the research were as follows: · Review stormwater runoff quality and treatment practices and the extent of runoff and greywater management in rural and peri-urban areas of South Africa. The review was to also determine the extent of quality control awareness and experience among stormwater management professionals and collate information upon which present and future needs can be assessed and addressed. · To develop a methodology to identify factors causing water quality management issues in low-cost, high-density settlements. · To develop a methodology to characterize storm and grey water quality as well as setting ambient water quality and management objectives. · To develop a methodology to identify and select potential non-structural and structural control interventions to manage storm and grey water quality. · Based on the above, to develop a decision support system for evaluation of potential interventions for storm and grey water management at planning level. The methodologies used to achieve the above objectives consisted of: literature review; consultations with stakeholders; data analysis and computations; model development; and model application. The current status of managing water quality pollution in urban areas is outlined and the related problems, specifically those applicable to developing areas are discussed. Management interventions employed to date in the management of water quality effects are set out and the applicability of such interventions to developing areas is identified. The potential of expert systems is evaluated and the application of this system to iii stormwater quality management models is assessed. A decision support system (DSS) was developed for rapid assessment of various water quality management interventions. The model is primarily targeted at those who are involved or are likely to be involved in stormwater quality management including catchment managers, local governments or municipalities, catchment management agencies, private consultants and researchers. The DSS and the related methodologies have been shown through Alexandra Township (north of Johannesburg) case study, to be useful and to satisfy all the objectives set out for the research. The results of the research are summarised and the merits and limitations of the decision support systems are discussed. Recommendations for the direction of future research and the development of the existing model are detailed. Specifically, it is recommended that: · Extensive monitoring be undertaken in order to improve the defaults in the model · A research be undertaken into the extent to which GIS can be integrated to the DSS to select appropriate management interventions and their sites · A research be undertaken into privatization and partnership in the ownership and operation of stormwater management systems. · Selection of least cost strategy with the DSS is presently achieved by trial and error process. The selection process can be improved if the DSS can be linked to an optimizer
    • …
    corecore