Search CORE

78 research outputs found

Integrating data warehouses with web data : a survey

Author: Aramburu Cabo María José
Berlanga Llavori Rafael
Pedersen Torben Bach
Pérez Martínez Juan Manuel
Publication venue: IEEE Computer Society
Publication date: 01/01/2008
Field of study

This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositori Institucional de la Universitat Jaume I

VBN

Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications

Author: Luján-Mora Sergio
Song Il-Yeol
Trujillo Juan
Publication venue: 'IGI Global'
Publication date: 01/01/2004
Field of study

Multidimensional (MD) modeling is the basis for data warehouses (DW), multidimensional databases (MDB) and on-line analytical processing (OLAP) applications. In this paper, we present how the unified modeling language (UML) can be successfully used to represent both structural and dynamic properties of these systems at the conceptual level. The structure of the system is specified by means of a UML class diagram that considers the main properties of MD modeling with minimal use of constraints and extensions of the UML. If the system to be modeled is too complex, thereby leading us to a considerable number of classes and relationships, we describe how to use the package grouping mechanism provided by the UML to simplify the final model. Furthermore, we provide a UML-compliant class notation (called cube class) to represent OLAP users’ initial requirements. We also describe how we can use the UML state and interaction diagrams to model the behavior of a data warehouse system. To facilitate the interchange of conceptual MD models, we provide a Document Type Definition (DTD) which allows us to represent the same MD modeling properties that can be considered by using our approach. From this DTD, we can directly generate valid eXtensible Markup Language (XML) documents that represent MD models at the conceptual level. We believe that our innovative approach provides a theoretical foundation for simplifying the conceptual design of MD systems and the examples included in this paper clearly illustrate the use of our approach

Repositorio Institucional de la Universidad de Alicante

CiteSeerX

Crossref

Efficient cube construction for smart city data

Author: Roantree Mark
Scriney Michael
Publication venue: CEUR-WS.org
Publication date: 18/03/2016
Field of study

To deliver powerful smart city environments, there is a requirement to analyse web produced data streams in close to real time so that city planners can employ up to date predictive models in both short and long term planning. Data cubes, fused from multiple sources provide a popular input to predictive models. A key component in this infrastructure is an efficient mechanism for transforming web data (XML or JSON) into multi-dimensional cubes. In our research, we have developed a framework for efficient transformation of XML data from multiple smart city services into DWARF cubes using a NoSQL storage engine. Our evaluation shows a high level of performance when compared to other approaches and thus, provides a platform for predictive models in a smart city environment

CiteSeerX

Irish Universities

DCU Online Research Access Service

Implementation of the multidimensional schemas integration method ORE

Author: Mayorova Daria
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

The goal of the project is the implementation of the semi-automatic method, named ORE, for creating multidimentional schemas for data warehouses by integrating information requirements in an iterative way

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications

Author: Lujan-Mora Sergio
Song Il-Yeol
Trujillo Juan
Publication venue
Publication date: 29/07/2006
Field of study

Journal of Database Management, Vol. 15, No.1, 2004, pp. 41-72. Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/song/publications/p_JDBMS04_Final.pdf.Multidimensional (MD) modeling is the basis for Data warehouses (DW), multidimensional databases (MDB) and On-Line Analytical Processing (OLAP) applications. In this paper, we present how the Unified Modeling Language (UML) can be successfully used to represent both structural and dynamic properties of these systems at the conceptual level. The structure of the system is specified by means of a UML class diagram that considers the main properties of MD modeling with minimal use of constraints and extensions of the UML. If the system to be modeled is too complex, thereby leading us to a considerable number of classes and relationships, we describe how to use the package grouping mechanism provided by the UML to simplify the final model. Furthermore, we provide a UML-compliant class notation (called cube class) to represent OLAP users' initial requirements. We also describe how we can use the UML state and interaction diagrams to model the behavior of a data warehouse system. To facilitate the interchange of conceptual MD models, we provide a Document Type Definition (DTD) which allows us to represent the same MD modeling properties that can be considered by using our approach. From this DTD, we can directly generate valid eXtensible Markup Language (XML) documents that represent MD models at the conceptual level. We believe that our innovative approach provides a theoretical foundation for simplifying the conceptual design of MD systems and the examples included in this paper clearly illustrate the use of our approach

Drexel Libraries E-Repository and Archives

A conceptual framework and a risk management approach for interoperability between geospatial datacubes

Author: Sboui Tarek
Publication venue: Bibliotheque de l' Universite Laval
Publication date: 01/01/2010
Field of study

De nos jours, nous observons un intérêt grandissant pour les bases de données géospatiales multidimensionnelles. Ces bases de données sont développées pour faciliter la prise de décisions stratégiques des organisations, et plus spécifiquement lorsqu’il s’agit de données de différentes époques et de différents niveaux de granularité. Cependant, les utilisateurs peuvent avoir besoin d’utiliser plusieurs bases de données géospatiales multidimensionnelles. Ces bases de données peuvent être sémantiquement hétérogènes et caractérisées par différent degrés de pertinence par rapport au contexte d’utilisation. Résoudre les problèmes sémantiques liés à l’hétérogénéité et à la différence de pertinence d’une manière transparente aux utilisateurs a été l’objectif principal de l’interopérabilité au cours des quinze dernières années. Dans ce contexte, différentes solutions ont été proposées pour traiter l’interopérabilité. Cependant, ces solutions ont adopté une approche non systématique. De plus, aucune solution pour résoudre des problèmes sémantiques spécifiques liés à l’interopérabilité entre les bases de données géospatiales multidimensionnelles n’a été trouvée. Dans cette thèse, nous supposons qu’il est possible de définir une approche qui traite ces problèmes sémantiques pour assurer l’interopérabilité entre les bases de données géospatiales multidimensionnelles. Ainsi, nous définissons tout d’abord l’interopérabilité entre ces bases de données. Ensuite, nous définissons et classifions les problèmes d’hétérogénéité sémantique qui peuvent se produire au cours d’une telle interopérabilité de différentes bases de données géospatiales multidimensionnelles. Afin de résoudre ces problèmes d’hétérogénéité sémantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication s’établit entre deux agents système représentant les bases de données géospatiales multidimensionnelles impliquées dans un processus d’interopérabilité. Cette communication vise à échanger de l’information sur le contenu de ces bases. Ensuite, dans l’intention d’aider les agents à prendre des décisions appropriées au cours du processus d’interopérabilité, nous évaluons un ensemble d’indicateurs de la qualité externe (fitness-for-use) des schémas et du contexte de production (ex., les métadonnées). Finalement, nous mettons en œuvre l’approche afin de montrer sa faisabilité.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility

CorpusUL

Constructing data marts from web sources using a graph common model

Author: Scriney Michael
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2018
Field of study

At a time when humans and devices are generating more information than ever, activities such as data mining and machine learning become crucial. These activities enable us to understand and interpret the information we have and predict, or better prepare ourselves for, future events. However, activities such as data mining cannot be performed without a layer of data management to clean, integrate, process and make available the necessary datasets. To that extent, large and costly data flow processes such as Extract-Transform-Load are necessary to extract from disparate information sources to generate ready-for-analyses datasets. These datasets are generally in the form of multi-dimensional cubes from which different data views can be extracted for the purpose of different analyses. The process of creating a multi-dimensional cube from integrated data sources is significant. In this research, we present a methodology to generate these cubes automatically or in some cases, close to automatic, requiring very little user interaction. A construct called a StarGraph acts as a canonical model for our system, to which imported data sources are transformed. An ontology-driven process controls the integration of StarGraph schemas and simple OLAP style functions generate the cubes or datasets. An extensive evaluation is carried out using a large number of agri data sources with user-defined case studies to identify sources for integration and the types of analyses required for the final data cubes

Irish Universities

DCU Online Research Access Service

Method for Reusing and Re-engineering Non-ontological Resources for Building Ontologies

Author: Villazón-Terrazas B.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2011
Field of study

This thesis is focused on the reuse and possible subsequent re-engineering of knowledge resources, as opposed to custom-building new ontologies from scratch. The deep analysis of the state of the art has revealed that there are some methods and tools in the literature for transforming non-ontological resources into ontologies, but with some limitations: _ Most of the methods presented are based on ad-hoc transformations for the resource type, and the resource implementation. _ Only a few take advantage of the resource data model, an important artifact for the re-engineering process [GGPSFVT08]. _ There is no any integrated framework, method or corresponding tool, that considers the resources types, data models and implementations identified in an unified way. _ With regard to the transformation approach, the majority of the methods perform a TBox transformation, many others perform an ABox transformation and some perform a population. However, no method includes the possibility to perform the three transformation approaches. _ Regarding to the degree of automation, almost all the methods perform a semi-automatic transformation of the resource. _ According to the explicitation of the hidden semantics in the relations of the resource components, we can state that the methods that perform a TBox transformation make explicit the semantics in the relations of the resource components. Most of those methods identify subClassOf relations, others identify ad-hoc relations, and some identify partOf relations. However, only a few methods make explicit the three types of relations. _ With respect to how the methods make explicit the hidden semantics in the relations of the resource terms, we can say that three methods rely on the domain expert for making explicit the semantics, and two rely on an external resource, e.g., DOLCE ontology. Moreover, there are two methods that rely on external resources but not for making explicit the hidden semantics, but for finding out a proper ontology for populating it. _ According to the provision of the methodological guidelines, almost all the methods provide methodological guidelines for the transformation. However these guidelines are not finely detailed; for instance, they do not provide information about who is in charge of performing a particular activity/task, nor when that activity/task has to be carried out. _ With regard to the techniques employed, most of the methods do not mention them at all. Only a few methods specify techniques as transformation rules, lexico-syntactic patterns, mapping rules and natural language techniques. In this thesis we have provided a method and its technological support that rely on re-engineering patterns in order to speed up the ontology development process by reusing and re-engineering as much as possible available non-ontological resources. To achieve this overall goal, we have decomposed it in the following objectives: (1) the definition of methodological aspects related with the reuse of non-ontolo-gical resource for building ontologies; (2) the definition of methodological aspects related with the re-engineering of non-ontological resources for building ontologies; (3) the creation of a library of patterns for re-engineering nonontological resources into ontologies; and (4) the development of a software library that implements the suggestions given by the re-engineering patterns. Having in mind these goals, in this chapter we present how the open research problems identified in Chapter 2 are solved by the main thesis contributions. Then, we discuss the verification of our hypotheses, and finally we provide an outlook for the future work in those topics

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Analyse OLAP d'un entrepôt de documents XML

Author: Abdelhédi Fatma
Ntsama Landry
Zurfluh Gilles
Publication venue: INformatique des ORganisations et Systèmes d'Information et de Décision (INFORSID)
Publication date: 01/01/2014
Field of study

Les systèmes OLAP basés sur des entrepôts de données sont aujourd’hui bien intégrés dans les organisations, ils facilitent le traitement et l’analyse de l’information pour la prise de décision. Le développement du Web a conduit à l’accroissement du volume de données traité, ainsi qu’à la diversification des sources de l’information. Ce problème de diversification a été en partie résolu grâce au langage XML. Celui-ci permet en effet le traitement et l’échange de données complexes et hétérogènes. Seulement c’est un format qui s’adapte mal aux systèmes OLAP et d’entrepôts classiques. De plus il n’existe à ce jour aucun standard permettant de répondre à cette problématique. Aussi nous avons développé un modèle multidimensionnel qui utilise le formalisme orienté objet UML pour décrire un entrepôt de documents XML orientés-document. Le schéma de cet entrepôt (appelé StarCD) représente la structure des documents à analyser, telle qu’elle est connue par le décideur. Et dans cet article nous présentons un nouveau langage d’analyse OLAP destiné aux décideurs, qui permet d’exprimer des requêtes complexes sur un entrepôt de documents XML décrit par un StarCD

Toulouse Capitole Publications