325 research outputs found
SemLAV: Local-As-View Mediation for SPARQL Queries
International audienceThe Local-As-View(LAV) integration approach aims at querying heterogeneous data in dynamic environments. In LAV, data sources are described as views over a global schema which is used to pose queries. Query processing requires to generate and execute query rewritings, but for SPARQL queries, the LAV query rewritings may not be generated or executed in a reasonable time. In this paper, we present SemLAV, an alternative technique to process SPARQL queries over a LAV integration system without generating rewritings. SemLAV executes the query against a partial instance of the global schema which is built on-the-fly with data from the relevant views. The paper presents an experimental study for SemLAV, and compares its performance with traditional LAV-based query processing techniques. The results suggest that SemLAV scales up to SPARQL queries even over a large number of views, while it significantly outperforms traditional solutions
Wikipedia as an encyclopaedia of life
In his 2003 essay E O Wilson outlined his vision for an “encyclopaedia of life” comprising “an electronic page for each species of organism on Earth”, each page containing “the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits.” Although the “quiet revolution” in biodiversity informatics has generated numerous online resources, including some directly inspired by Wilson's essay (e.g., "http://ispecies.org":http://ispecies.org, "http://www.eol.org":http://www.eol.org), we are still some way from the goal of having available online all relevant information about a species, such as its taxonomy, evolutionary history, genomics, morphology, ecology, and behaviour. While the biodiversity community has been developing a plethora of databases, some with overlapping goals and duplicated content, Wikipedia has been slowly growing to the point where it now has over 100,000 pages on biological taxa. My goal in this essay is to explore the idea that, largely independent of the efforts of biodiversity informatics and well-funded international efforts, Wikipedia ("http://en.wikipedia.org/wiki/Main_Page":http://en.wikipedia.org/wiki/Main_Page) has emerged as potentially the best platform for fulfilling E O Wilson’s vision
Where the streets have known names
Street names provide important insights into the local culture, history, and politics of places. Linked open data provide a wealth of knowledge that can be associated with street names, enabling novel ways to explore cultural geographies. This paper presents a three-fold contribution. We present (1) a technique to establish a correspondence between street names and the entities that they refer to. The method is based on Wikidata, a knowledge base derived from Wikipedia. The accuracy of this mapping is evaluated on a sample of streets in Rome. As this approach reaches limited coverage, we propose to tap local knowledge with (2) a simple web platform. Users can select the best correspondence from the calculated ones or add another entity not discovered by the automated process. As a result, we design (3) an enriched OpenStreetMap web map where each street name can be explored in terms of the properties of its associated entity. Through several filters, this tool is a first step towards the interactive exploration of toponymy, showing how open data can reveal facets of the cultural texture that pervades places
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution
Recommended from our members
Ontology-based end-user visual query formulation: Why, what, who, how, and which?
Value creation in an organisation is a time-sensitive and data-intensive process, yet it is often delayed and bounded by the reliance on IT experts extracting data for domain experts. Hence, there is a need for providing people who are not professional developers with the flexibility to pose relatively complex and ad hoc queries in an easy and intuitive way. In this respect, visual methods for query formulation undertake the challenge of making querying independent of users’ technical skills and the knowledge of the underlying textual query language and the structure of data. An ontology is more promising than the logical schema of the underlying data for guiding users in formulating queries, since it provides a richer vocabulary closer to the users’ understanding. However, on the one hand, today the most of world’s enterprise data reside in relational databases rather than triple stores, and on the other, visual query formulation has become more compelling due to ever-increasing data size and complexity—known as Big Data. This article presents and argues for ontology-based visual query formulation for end-users; discusses its feasibility in terms of ontology-based data access, which virtualises legacy relational databases as RDF, and the dimensions of Big Data; presents key conceptual aspects and dimensions, challenges, and requirements; and reviews, categorises, and discusses notable approaches and systems
Prikaz znanja u internetu stvari: semantičko modeliranje i njegove primjene
Semantic modelling provides a potential basis for interoperating among different systems and applications in the Internet of Things (IoT). However, current work has mostly focused on IoT resource management while not on the access and utilisation of information generated by the “Things”. We present the design of a comprehensive and lightweight semantic description model for knowledge representation in the IoT domain. The design follows the widely recognised best practices in knowledge engineering and ontology modelling. Users are allowed to extend the model by linking to external ontologies, knowledge bases or existing linked data. Scalable access to IoT services and resources is achieved through a distributed, semantic storage design. The usefulness of the model is also illustrated through an IoT service discovery method.Semantičko modeliranje pruža potencijalnu osnovu za me.udjelovanje različitih sustava i aplikacija unutar interneta stvari (IoT). Međutim, postojeći radovi uglavnom su fokusirani na upravljanje IoT resursima, ali ne i pristupu i korištenju informacija koje generira “stvar”. Predstavljamo projektiranje sveobuhvatnog i laganog semantičkog opisnog modela za prikaz znanja u IoT domeni. Projektiranje slijedi široko-priznate najbolje običaje u inženjerstvu znanja i ontološkom modeliranju. Korisnicima se dopušta proširenje modela povezivanjem na eksterne ontologije, baze znanja ili postoje će povezane podatke. Skalabilni pristup IoT uslugama i resursima postiže se kroz distribuirano, semantičko projektiranje pohrane. Upotrebljivost modela tako.er je ilustrirana kroz metodu pronalaska IoT usluga
Documentation FiFoSiM: Integrated Tax Benefit Microsimulation and CGE Model
ABSTRACT: This paper describes FiFoSiM, the integrated tax benefit microsimulation and computable general equilibrium (CGE) model of the Center of Public Economics at the University of Cologne. FiFoSiM consists of three main parts. The first part is a static tax benefit microsimulation module. The second part adds a behavioural component to the model: an econometrically estimated labour supply model. The third module is a CGE model which allows the user of FiFoSiM to assess the global economic effects of policy measures. Two specific features distinguish FiFoSiM from other tax benefit models: First, the simultaneous use of two databases for the tax benefit module and second, the linkage of the tax benefit model to a CGE model
Analysis of schema structures in the Linked Open Data graph based on unique subject URIs, pay-level domains, and vocabulary usage
The Linked Open Data (LOD) graph represents a web-scale distributed knowledge graph interlinking information about entities across various domains. A core concept is the lack of pre-defined schema which actually allows for flexibly modelling data from all kinds of domains. However, Linked Data does exhibit schema information in a twofold way: by explicitly attaching RDF types to the entities and implicitly by using domain-specific properties to describe the entities. In this paper, we present and apply different techniques for investigating the schematic information encoded in the LOD graph at different levels of granularity. We investigate different information theoretic properties of so-called Unique Subject URIs (USUs) and measure the correlation between the properties and types that can be observed for USUs on a large-scale semantic graph data set. Our analysis provides insights into the information encoded in the different schema characteristics. Two major findings are that implicit schema information is far more discriminative and that applications involving schema information based on either types or properties alone will only capture between 63.5 and 88.1 % of the schema information contained in the data. As the level of discrimination depends on how data providers model and publish their data, we have conducted in a second step an investigation based on pay-level domains (PLDs) as well as the semantic level of vocabularies. Overall, we observe that most data providers combine up to 10 vocabularies to model their data and that every fifth PLD uses a highly structured schema
- …
