129 research outputs found
Towards Cleaning-up Open Data Portals: A Metadata Reconciliation Approach
This paper presents an approach for metadata reconciliation, curation and
linking for Open Governamental Data Portals (ODPs). ODPs have been lately the
standard solution for governments willing to put their public data available
for the society. Portal managers use several types of metadata to organize the
datasets, one of the most important ones being the tags. However, the tagging
process is subject to many problems, such as synonyms, ambiguity or
incoherence, among others. As our empiric analysis of ODPs shows, these issues
are currently prevalent in most ODPs and effectively hinders the reuse of Open
Data. In order to address these problems, we develop and implement an approach
for tag reconciliation in Open Data Portals, encompassing local actions related
to individual portals, and global actions for adding a semantic metadata layer
above individual portals. The local part aims to enhance the quality of tags in
a single portal, and the global part is meant to interlink ODPs by establishing
relations between tags.Comment: 8 pages,10 Figures - Under Revision for ICSC201
The value and challenges of providing and accessing Government open data in developing countries: Kenyan context from a citizen’s perspective
In the recent years, Governments both in developed and developing countries are embracing open data initiatives as fundament al in facilitating government transparency, accountability, and public participation by making data freely available to the public. In addition, open data serves as an essential cornerstone in supporting technological innovation and economic growth by enabling third parties to develop new kinds of digital applications and services (Gray 2014; Ding et al. 2011; Shadbolt et al. 2012).
Despite the rising uptake of such initiatives, little has been written on the experience as well as the skills and knowledge for citizens in open data and technology environments. This paper seeks to fill this gap by presenting unique lessons learnt from the implementation of Kenya’s globally acclaimed Open Data Portal which was launched in July 2011. Kenya forms an interesting study choice as the country was the first developing country in sub-Saharan Africa and the second on the continent after Morocco to develop the portal. The portal, powered by Socrata Inc, aims to make core government developmental, demographic, statistical and expenditure data available for researchers, policymakers, ICT developers and the general public.
The varying technological, economical, and cultural differences in Kenya significantly affect access and usage of the portal as seen in wide inequalities in technical expertise, internet access, and extent of use. In addition, there are various system and management challenges inhibiting the utility and ease of interaction of the Portal. These challenges include empty datasets, broken links, obsolete information, and lack of numerous datasets requested by the public which date back to over two years.
The authors who are Kenyan citizens explore the challenges and best practices learnt from implementation of Kenya Open Data Portal and discuss from a citizen’s view point these unique and interesting findings and how they relate and contrast to other countries
GI Systems for public health with an ontology based approach
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.Health is an indispensable attribute of human life. In modern age,
utilizing technologies for health is one of the emergent concepts in
several applied fields. Computer science, (geographic) information
systems are some of the interdisciplinary fields which motivates this
thesis.
Inspiring idea of the study is originated from a rhetorical disease
DbHd: Database Hugging Disorder, defined by Hans Rosling at
World Bank Open Data speech in May 2010. The cure of this disease
can be offered as linked open data, which contains ontologies for
health science, diseases, genes, drugs, GEO species etc. LOD-Linked
Open Data provides the systematic application of information by
publishing and connecting structured data on the Web.
In the context of this study we aimed to reduce boundaries
between semantic web and geo web. For this reason a use case data is
studied from Valencia CSISP- Research Center of Public Health in
which the mortality rates for particular diseases are represented
spatio-temporally. Use case data is divided into three conceptual
domains (health, spatial, statistical), enhanced with semantic relations
and descriptions by following Linked Data Principles. Finally in order
to convey complex health-related information, we offer an
infrastructure integrating geo web and semantic web. Based on the
established outcome, user access methods are introduced and future
researches/studies are outlined
LODNav – An Interactive Visualization of the Linking Open Data Cloud
The emergence of the Linking Open Data Cloud (LODC) is an example of the adoption of Linked Data principles and the creation of a Web of Data. There is an increasing amount of information linked across member datasets of the LODC by means of RDF links, yet there is little support for a human to understand which datasets are connected to one another. This research presents a novel approach for understanding these interconnections with the publicly accessible tool LODNav – Linking Open Data Navigator. LODNav provides a visualization metaphor of the LODC by positioning member datasets of the LODC on a world map based on the geographical location of the dataset. This interactive tool aims to provide a dynamic up-to-date visualization of the LODC and allows the extraction of information about the datasets as well as their interconnections as RDF data
A systematic literature review of open data quality in practice
Context: The main objective of open data initiatives is to make information freely available through easily accessible mechanisms and facilitate exploitation. In practice openness should be accompanied with a certain level of trustwor- thiness or guarantees about the quality of data. Traditional data quality is a thoroughly researched field with several benchmarks and frameworks to grasp its dimensions. However, quality assessment in open data is a complicated process as it consists of stakeholders, evaluation of datasets as well as the publishing platform.
Objective: In this work, we aim to identify and synthesize various features of open data quality approaches in practice. We applied thematic synthesis to identify the most relevant research problems and quality assessment methodologies. Method: We undertook a systematic literature review to summarize the state of the art on open data quality. The review process starts by developing the review protocol in which all steps, research questions, inclusion and exclusion criteria and analysis procedures are included. The search strategy retrieved 9323 publications from four scientific digital libraries. The selected papers were published between 2005 and 2015. Finally, through a discussion between the authors, 63 paper were included in the final set of selected papers.
Results: Open data quality, in general, is a broad concept, and it could apply to multiple areas. There are many quality issues concerning open data hindering their actual usage for real-world applications. The main ones are unstruc- tured metadata, heterogeneity of data formats, lack of accuracy, incompleteness and lack of validation techniques. Furthermore, we collected the existing quality methodologies from selected papers and synthesized under a unifying classification schema. Also, a list of quality dimensions and metrics from selected paper is reported.
Conclusion: In this research, we provided an overview of the methods related to open data quality, using the instru- ment of systematic literature reviews. Open data quality methodologies vary depending on the application domain. Moreover, the majority of studies focus on satisfying specific quality criteria. With metrics based on generalized data attributes a platform can be created to evaluate all possible open dataset. Also, the lack of methodology validation remains a major problem. Studies should focus on validation techniques
QuerioCity: A Linked Data Platform for Urban Information Management
Abstract. In this paper, we present QuerioCity, a platform to catalog, index and query highly heterogenous information coming from complex systems, such as cities. A series of challenges are identified: namely, the heterogeneity of the domain and the lack of a common model, the vol-ume of information and the number of data sets, the requirement for a low entry threshold to the system, the diversity of the input data, in terms of format, syntax and update frequency (streams vs static data), and the sensitivity of the information. We propose an approach for incre-mental and continuous integration of static and streaming data, based on Semantic Web technologies. The proposed system is unique in the literature in terms of handling of multiple integrations of available data sets in combination with flexible provenance tracking, privacy protection and continuous integration of streams. We report on lessons learnt from building the first prototype for Dublin.
Spatial Search Strategies for Open Government Data: A Systematic Comparison
The increasing availability of open government datasets on the Web calls for
ways to enable their efficient access and searching. There is however an
overall lack of understanding regarding spatial search strategies which would
perform best in this context. To address this gap, this work has assessed the
impact of different spatial search strategies on performance and user relevance
judgment. We harvested machine-readable spatial datasets and their metadata
from three English-based open government data portals, performed metadata
enhancement, developed a prototype and performed both a theoretical and
user-based evaluation. The results highlight that (i) switching between area of
overlap and Hausdorff distance for spatial similarity computation does not have
any substantial impact on performance; and (ii) the use of Hausdorff distance
induces slightly better user relevance ratings than the use of area of overlap.
The data collected and the insights gleaned may serve as a baseline against
which future work can compare.Comment: Paper accepted to GIR'19: 13th Workshop on Geographic Information
Retrieval (Lyon, France
Distribution and Process of Environmental Inequity in the Brazos Valley, Texas
Lower income and minority communities have long borne an unequal burden of toxic pollution from environmental hazards. I examined environmental inequity, the unequal distribution of environmental hazards in minority and economically disadvantaged communities and the exclusion of community members from environmental decision making, in Brazos Valley, Texas. This project offers a broad review of unequal environmental burdens and marginalization of minority communities as a background to better understand problems in Central Texas. Geographical Information System (GIS) analysis were used to examine the distribution of potential environmental exposures in Brazos Valley, while qualitative methods assessed the role of a case study community (Bryan, Texas) in the environmental decision-making processes related to these risks
- …