3,234 research outputs found
Using Ontologies for the Design of Data Warehouses
Obtaining an implementation of a data warehouse is a complex task that forces
designers to acquire wide knowledge of the domain, thus requiring a high level
of expertise and becoming it a prone-to-fail task. Based on our experience, we
have detected a set of situations we have faced up with in real-world projects
in which we believe that the use of ontologies will improve several aspects of
the design of data warehouses. The aim of this article is to describe several
shortcomings of current data warehouse design approaches and discuss the
benefit of using ontologies to overcome them. This work is a starting point for
discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
A conceptual framework and a risk management approach for interoperability between geospatial datacubes
De nos jours, nous observons un intĂ©rĂȘt grandissant pour les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es sont dĂ©veloppĂ©es pour faciliter la prise de dĂ©cisions stratĂ©giques des organisations, et plus spĂ©cifiquement lorsquâil sâagit de donnĂ©es de diffĂ©rentes Ă©poques et de diffĂ©rents niveaux de granularitĂ©. Cependant, les utilisateurs peuvent avoir besoin dâutiliser plusieurs bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es peuvent ĂȘtre sĂ©mantiquement hĂ©tĂ©rogĂšnes et caractĂ©risĂ©es par diffĂ©rent degrĂ©s de pertinence par rapport au contexte dâutilisation. RĂ©soudre les problĂšmes sĂ©mantiques liĂ©s Ă lâhĂ©tĂ©rogĂ©nĂ©itĂ© et Ă la diffĂ©rence de pertinence dâune maniĂšre transparente aux utilisateurs a Ă©tĂ© lâobjectif principal de lâinteropĂ©rabilitĂ© au cours des quinze derniĂšres annĂ©es. Dans ce contexte, diffĂ©rentes solutions ont Ă©tĂ© proposĂ©es pour traiter lâinteropĂ©rabilitĂ©. Cependant, ces solutions ont adoptĂ© une approche non systĂ©matique. De plus, aucune solution pour rĂ©soudre des problĂšmes sĂ©mantiques spĂ©cifiques liĂ©s Ă lâinteropĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles nâa Ă©tĂ© trouvĂ©e. Dans cette thĂšse, nous supposons quâil est possible de dĂ©finir une approche qui traite ces problĂšmes sĂ©mantiques pour assurer lâinteropĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ainsi, nous dĂ©finissons tout dâabord lâinteropĂ©rabilitĂ© entre ces bases de donnĂ©es. Ensuite, nous dĂ©finissons et classifions les problĂšmes dâhĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique qui peuvent se produire au cours dâune telle interopĂ©rabilitĂ© de diffĂ©rentes bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Afin de rĂ©soudre ces problĂšmes dâhĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication sâĂ©tablit entre deux agents systĂšme reprĂ©sentant les bases de donnĂ©es gĂ©ospatiales multidimensionnelles impliquĂ©es dans un processus dâinteropĂ©rabilitĂ©. Cette communication vise Ă Ă©changer de lâinformation sur le contenu de ces bases. Ensuite, dans lâintention dâaider les agents Ă prendre des dĂ©cisions appropriĂ©es au cours du processus dâinteropĂ©rabilitĂ©, nous Ă©valuons un ensemble dâindicateurs de la qualitĂ© externe (fitness-for-use) des schĂ©mas et du contexte de production (ex., les mĂ©tadonnĂ©es). Finalement, nous mettons en Ćuvre lâapproche afin de montrer sa faisabilitĂ©.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organizationâs strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility
An automated ETL for online datasets
While using online datasets for machine learning is commonplace today, the quality of these datasets impacts on the performance
of prediction algorithms. One method for improving the semantics of new data sources is to map these sources to a common
data model or ontology. While semantic and structural heterogeneities must still be resolved, this provides a well established
approach to providing clean datasets, suitable for machine learning and analysis. However, when there is a requirement for a
close to real time usage of online data, a method for dynamic Extract-Transform-Load of new sources data must be developed.
In this work, we present a framework for integrating online and enterprise data sources, in close to real time, to provide
datasets for machine learning and predictive algorithms. An exhaustive evaluation compares a human built data transformation
process with our systemâs machine generated ETL process, with very favourable results, illustrating the value and impact of
an automated approach
An automated ETL for online datasets
While using online datasets for machine learning is commonplace today, the quality of these datasets impacts on the performance
of prediction algorithms. One method for improving the semantics of new data sources is to map these sources to a common
data model or ontology. While semantic and structural heterogeneities must still be resolved, this provides a well established
approach to providing clean datasets, suitable for machine learning and analysis. However, when there is a requirement for a
close to real time usage of online data, a method for dynamic Extract-Transform-Load of new sources data must be developed.
In this work, we present a framework for integrating online and enterprise data sources, in close to real time, to provide
datasets for machine learning and predictive algorithms. An exhaustive evaluation compares a human built data transformation
process with our systemâs machine generated ETL process, with very favourable results, illustrating the value and impact of
an automated approach
- âŠ