78 research outputs found

    Data Integration for Open Data on the Web

    Get PDF
    In this lecture we will discuss and introduce challenges of integrating openly available Web data and how to solve them. Firstly, while we will address this topic from the viewpoint of Semantic Web research, not all data is readily available as RDF or Linked Data, so we will give an introduction to different data formats prevalent on the Web, namely, standard formats for publishing and exchanging tabular, tree-shaped, and graph data. Secondly, not all Open Data is really completely open, so we will discuss and address issues around licences, terms of usage associated with Open Data, as well as documentation of data provenance. Thirdly, we will discuss issues connected with (meta-)data quality issues associated with Open Data on the Web and how Semantic Web techniques and vocabularies can be used to describe and remedy them. Fourth, we will address issues about searchability and integration of Open Data and discuss in how far semantic search can help to overcome these. We close with briefly summarizing further issues not covered explicitly herein, such as multi-linguality, temporal aspects (archiving, evolution, temporal querying), as well as how/whether OWL and RDFS reasoning on top of integrated open data could be help

    Investigating metadata adoptions for open government data portals in US cities

    Get PDF
    Open government data (OGD) is a valuable resource for both policy transparency and government accountability. All levels of the United States government are working hard to promote open data and its portals. However, there is still a lack of stud-ies on local-level OGD portals in the United States, particularly on the quality of metadata adopted by these portals. By ex-amining 200 US cities, a list of 112 local-level portals is sampled and we investigate the current usages of open data plat-forms for building local-level OGD portals. This study further investigates and discusses the adoption and potential issues of metadata on those OGD portals. Our result findings discuss the platform distributions among US local-level OGD portals, and also highlight several critical issues associated with metadata on the portals. We anticipate the results will inspire further studies on identifying solutions to improve the metadata and enhance the usability of open government data portal

    QODA – Methodology and Legislative Background for Assessment of Open Government Datasets Quality

    Get PDF
    In last few years, many open government data portals have been emerging in the world. These portals publish open government datasets which can be accessed and used by everyone for their own needs. In this paper, we propose methodology named QODA (Quality of Open government DAtasets) for assessment of quality of published datasets via two aspects. First one is assessment of quality of pure open government datasets, and second is assessment of quality features on the platforms which contributes to the publication of quality datasets. It provides a step-by-step dataset analysis guidance and summarization of results. Research presented in this paper shows that open government dataset quality depends on data provider as well as proper definition of metadata behind datasets. Our findings result in recommendations to open government data (OGD) publishers, to constantly supervise the use of published datasets, with aim to have timely and punctual information on OGD portals, with special attention on quality features

    The Type 2 Diabetes Knowledge Portal: an Open access Genetic Resource Dedicated to Type 2 Diabetes and Related Traits

    Get PDF
    Associations between human genetic variation and clinical phenotypes have become a foundation of biomedical research. Most repositories of these data seek to be disease-agnostic and therefore lack disease-focused views. The Type 2 Diabetes Knowledge Portal (T2DKP) is a public resource of genetic datasets and genomic annotations dedicated to type 2 diabetes (T2D) and related traits. Here, we seek to make the T2DKP more accessible to prospective users and more useful to existing users. First, we evaluate the T2DKP\u27s comprehensiveness by comparing its datasets with those of other repositories. Second, we describe how researchers unfamiliar with human genetic data can begin using and correctly interpreting them via the T2DKP. Third, we describe how existing users can extend their current workflows to use the full suite of tools offered by the T2DKP. We finally discuss the lessons offered by the T2DKP toward the goal of democratizing access to complex disease genetic results

    The Type 2 Diabetes Knowledge Portal: an open access genetic resource dedicated to type 2 diabetes and related traits

    Get PDF
    Associations between human genetic variation and clinical phenotypes have become a foundation of biomedical research. Most repositories of these data seek to be disease-agnostic and therefore lack disease-focused views. The Type 2 Diabetes Knowledge Portal (T2DKP) is a public resource of genetic datasets and genomic annotations dedicated to type 2 diabetes (T2D) and related traits. Here, we seek to make the T2DKP more accessible to prospective users and more useful to existing users. First, we evaluate the T2DKP's comprehensiveness by comparing its datasets with those of other repositories. Second, we describe how researchers unfamiliar with human genetic data can begin using and correctly interpreting them via the T2DKP. Third, we describe how existing users can extend their current workflows to use the full suite of tools offered by the T2DKP. We finally discuss the lessons offered by the T2DKP toward the goal of democratizing access to complex disease genetic results

    Developing data catalogue extensions for metadata harvesting in GIS

    Get PDF
    Researchers in geoscience often use several Geographic Information Systems (GIS) to find and access different types of data for research use. The researchers do not always know in which GIS their needed data reside, and therefore might spend considerable amount of time searching for it. A better solution would be a GIS that combines the data in a single, searchable system. In this thesis we examine how a GIS that combines data from external data servers aid researchers in doing research. A GIS prototype with harvesting capabilities for a few commonly used data repositories in the geoscientific field is presented. First, we interview researchers to know about their GIS usage and problems, and assess relevant standards, protocols and technology to use in a GIS prototype. We present the prototype implementation, and demonstrate that it is quicker to use than searching several data repositories. The evaluation of the prototype show that the prototype has potential, but that improvements have to be considered, especially in regard to supporting harvesting from additional types of data repositories.Masteroppgave i informatikkINF39

    Application of the benchmark dose-response modelling approach for risk characterization of chemicals

    Get PDF
    Toxicology is the discipline that investigates the possible adverse effects of chemical exposure on human, animal and environmental health. Chemical risk assessment is the process that aims to identify potentially hazardous substances and describes the probability of adverse outcomes associated with their exposure. Biological changes and adverse effects do not occur after a threshold level is surpassed, but gradually and following a sequence of linked events. Traditionally, the no-observed-adverse-effect-level (NOAEL) approach has been used to detect the highest dose at which no adverse effect was observed. However, the NOAEL approach has methodological limitations and disadvantages that have resulted in it being increasingly replaced by the scientifically more advanced benchmark dose (BMD) approach. The BMD-modelling approach is a flexible method that takes all uncertainty and variability in the data into account, providing better estimates of doses leading to the potential adverse effects. Nonetheless, there are a number of knowledge gaps that need to be addressed and a lack of consensus persists regarding certain methodological aspects of this modelling strategy. The overall aim of this thesis was to contribute to the BMD field and expand the knowledge base by applying this approach to the areas of risk assessment and pharmaceutical development, addressing some identified challenges and discussing potential improvements. In particular, this thesis covers three topics that are interconnected, namely the choice of the Critical Effect Size (CES) (study I and III to VI), the analysis of multiple endpoints (study II to VI) and the assessment of chemical mixtures (study I, V and VI). These topics were applied to data from studies on chemicals, namely per- and polyfluoroalkyl substances (PFAS) (study I and VI), polychlorinated biphenyls (PCBs) (study IV and V) and a candidate drug in pharmaceutical development (study II and III) and the pesticide norflurazon (study VI). Study I combined human and animal data in order to derive the probabilistic risk for a 10% decrease in total triiodothyronine (T3) hormone levels, depending on residency time. The human data consisted of perfluorooctanesulfonic acid (PFOS) and perfluorohexanesulfonic acid (PFHxS) serum levels from the resident population in Ronneby, a Swedish village that was highly exposed to PFAS through contaminated drinking water. The animal data originated from a 6-month subchronic study in monkeys, exposed to PFOS once a day. This integrated probabilistic risk assessment (IPRA) analysis demonstrated that longer exposure periods were associated with a larger proportion of the population at risk, ranging from 2.1% (90% C.I. 0.4% – 13.1%) to 3.5% (90% C.I. 0.7% – 21.8%) for residents exposed to PFOS and PFHxS for at least 1 or 29 years, respectively. This risk was mostly distributed among women, and exposure duration was thegreatest source of uncertainty (60.8%). It was concluded that IPRA is an advantageous method to calculate the risk for adverse effects, in comparison to the deterministic Margin of Exposure aproach (MoE). Study II analyzed data from three subsequential safety assessment studies performed in rats to investigate the potential toxicity of an anti-oncogenic candidate drug in pharmaceutical development. The partial least squares (PLS) modelling approach was used to detect associations between clinical signs observed during the study, a 5% body weight decrease and pathological findings noted after study termination. Piloerection, eyes half shut and slightly decreased motor activity were the signs that were most strongly associated with the pathological findings, and the models accurately predicted the injuries observed in the thymus, testes, epididymides and bone marrow. The findings indicate that an evaluation of clinical signs as an integrated toxicity evaluation has potential 3R (Replacement, Reduction and Refinement of animal use) gains, especially in terms of Refinement of animal studies. The study suggests that the PLS-modelling approach can be employed to predict pathological changes, monitor animal welfare and support the decision-making process during pre-clinical safety and toxicity assessment studies. Study III analyzed the same data as study II, but applied the BMD-modelling approach instead, with a different objective, namely to describe potential relationships between the dose and the findings made in the 63 examined endpoints. The endpoints modelled included biochemistry and hematology endpoints, body weight changes, organ pathology findings and clinical observations. The resulting BMDs and BMDLs were compared to the study NOAEL (or LOAEL) and were often lower than the estimates of the NOAEL approach. A 5% change was also compared to the findings based on an adversity threshold derived from the observed and endpoint-specific magnitude of change. Additionally, the BMD-modelling was also considered to have a strong focus on the Refinement of animal studies. In summary, it was shown that modelling multiple endpoints is desirable, providing a more complete overview of the potential toxicity of a candidate drug and improving the pharmaceutical development process. Study IV assessed the potential toxicity of PCB-156 (2,3,3′,4,4′,5-hexachlorobiphenyl) following a 90-day study in rats exposed daily through their diet. Dose-dependent toxicological effects were described, including body and organ changes but also in the assessed retinoid system endpoints. Retinoid disruption and effects in the organs of rats were demonstrated employing the BMD dose-response modelling approach, revealing that the apolar liver retinoid concentrations were the most sensitive endpoint. The retinoid system was shown to be sensitive to PCB-156 exposure, and it was suggested that its endpoints should be more often considered for chemical risk assessment purposes. Study V employed the BMD method to calculate relative potency factors (RPFs) for seven PCBs (PCB-28, 77, 105, 118, 128, 153 and 156) and one PCB-mixture. PCB-126 was used as an index chemical, and the eight 90-day regulatory toxicity studies for the individual congeners were performed under the same conditions (the PCB-mixture study was 28 days long). The liver apolar retinoids levels and concentration, and the remaining endpoints examined, estimated greater RPFs than those calculated by the World Health Organization (WHO) in 2006 (Van den Berg et al., 2006), being suggestive of a hazard underestimation. In fact, the potency factors estimated in this study, based on the ethoxyresorufin-O-deethylase (EROD) enzymatic activity (a historically used endpoint to calculate RPFs), were the lowest in comparison to other endpoints for which RPFs were calculated. In summary, RPFs were useful to describe the potential toxicity of structurally similar compounds, expressed in units equivalent to the index chemical, and the retinoid system proved once again to be susceptible to changes following low-dose PCB exposure. Study VI focused on the choice of CES, a matter of debate when applying the BMD method to continuous data. Currently, there is no internationally harmonized approach to choosing the CES, and five strategies were examined: the EFSA default value of 5% or 10%, the US EPA 1 SD approach, an endpoint-specific CES based on historical data, the General Theory of Effect Size (GTES) and expert judgment. All examined strategies featured advantages and limitations, and the different choices of CES led to distinct reference values when applied to five case-studies, analyzing PFAS, PCB-156 and a pesticide (norflurazon) data. Although some of these strategies delivered similar CES values, it was not always the case, and reliance on a single method to choose the CES is not recommendable. It was concluded that expert judgment is irreplaceable and that the decision-making process performed by risk assessors and managers regarding the likely threshold of adversity should be supported by BMD analysis of the data comparing different CES. This could lead to a better overview of the data package and understanding of the doses leading to different magnitudes of effects, which would lead to better motivation of the choices and decisions made. In conclusion, this thesis demonstrates that the BMD method is a flexible modelling approach to assess the potential effects of several classes of substances, such as PFAS, PCBs and candidate drugs. Possible applications in the chemical risk assessment and pharmaceutical development areas were demonstrated. Additionally, it was shown that the BMD approach has a strong 3R potential and extracts a considerable amount of information from the data. The BMD approach is in chemical risk assessment to stay, and much like a Swiss army knife, it is a useful and multi-purpose tool that will support you in the derivation of reference values of superior quality
    • …
    corecore