78 research outputs found
Integrating data warehouses with web data : a survey
This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML
technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper
reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces
the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for
XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of
information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover
the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research
line
Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications
Multidimensional (MD) modeling is the basis for data warehouses (DW), multidimensional databases (MDB) and on-line analytical processing (OLAP) applications. In this paper, we present how the unified modeling language (UML) can be successfully used to represent both structural and dynamic properties of these systems at the conceptual level. The structure of the system is specified by means of a UML class diagram that considers the main properties of MD modeling with minimal use of constraints and extensions of the UML. If the system to be modeled is too complex, thereby leading us to a considerable number of classes and relationships, we describe how to use the package grouping mechanism provided by the UML to simplify the final model. Furthermore, we provide a UML-compliant class notation (called cube class) to represent OLAP usersâ initial requirements. We also describe how we can use the UML state and interaction diagrams to model the behavior of a data warehouse system. To facilitate the interchange of conceptual MD models, we provide a Document Type Definition (DTD) which allows us to represent the same MD modeling properties that can be considered by using our approach. From this DTD, we can directly generate valid eXtensible Markup Language (XML) documents that represent MD models at the conceptual level. We believe that our innovative approach provides a theoretical foundation for simplifying the conceptual design of MD systems and the examples included in this paper clearly illustrate the use of our approach
Efficient cube construction for smart city data
To deliver powerful smart city environments, there is a requirement to analyse web produced data streams in close to real time so that city planners can employ up to date predictive models in both short and long term planning. Data cubes, fused from multiple sources provide a popular input to predictive models. A key component in this infrastructure is an efficient mechanism for transforming web data (XML or JSON) into multi-dimensional cubes. In our research, we have developed a framework for efficient transformation of XML data from multiple smart city services into DWARF cubes using a NoSQL storage engine. Our evaluation shows a high level of performance when compared to other approaches and thus, provides a platform for predictive models in a smart city environment
Implementation of the multidimensional schemas integration method ORE
The goal of the project is the implementation of the semi-automatic method, named ORE, for creating multidimentional schemas for data warehouses by integrating information requirements in an iterative way
Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications
Journal of Database Management, Vol. 15, No.1, 2004, pp. 41-72. Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/song/publications/p_JDBMS04_Final.pdf.Multidimensional (MD) modeling is the basis for Data warehouses (DW),
multidimensional databases (MDB) and On-Line Analytical Processing (OLAP)
applications. In this paper, we present how the Unified Modeling Language (UML) can be
successfully used to represent both structural and dynamic properties of these systems at
the conceptual level. The structure of the system is specified by means of a UML class
diagram that considers the main properties of MD modeling with minimal use of
constraints and extensions of the UML. If the system to be modeled is too complex, thereby
leading us to a considerable number of classes and relationships, we describe how to use
the package grouping mechanism provided by the UML to simplify the final model.
Furthermore, we provide a UML-compliant class notation (called cube class) to represent
OLAP users' initial requirements. We also describe how we can use the UML state and
interaction diagrams to model the behavior of a data warehouse system. To facilitate the
interchange of conceptual MD models, we provide a Document Type Definition (DTD)
which allows us to represent the same MD modeling properties that can be considered by
using our approach. From this DTD, we can directly generate valid eXtensible Markup
Language (XML) documents that represent MD models at the conceptual level. We believe
that our innovative approach provides a theoretical foundation for simplifying the
conceptual design of MD systems and the examples included in this paper clearly illustrate
the use of our approach
A conceptual framework and a risk management approach for interoperability between geospatial datacubes
De nos jours, nous observons un intĂ©rĂȘt grandissant pour les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es sont dĂ©veloppĂ©es pour faciliter la prise de dĂ©cisions stratĂ©giques des organisations, et plus spĂ©cifiquement lorsquâil sâagit de donnĂ©es de diffĂ©rentes Ă©poques et de diffĂ©rents niveaux de granularitĂ©. Cependant, les utilisateurs peuvent avoir besoin dâutiliser plusieurs bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es peuvent ĂȘtre sĂ©mantiquement hĂ©tĂ©rogĂšnes et caractĂ©risĂ©es par diffĂ©rent degrĂ©s de pertinence par rapport au contexte dâutilisation. RĂ©soudre les problĂšmes sĂ©mantiques liĂ©s Ă lâhĂ©tĂ©rogĂ©nĂ©itĂ© et Ă la diffĂ©rence de pertinence dâune maniĂšre transparente aux utilisateurs a Ă©tĂ© lâobjectif principal de lâinteropĂ©rabilitĂ© au cours des quinze derniĂšres annĂ©es. Dans ce contexte, diffĂ©rentes solutions ont Ă©tĂ© proposĂ©es pour traiter lâinteropĂ©rabilitĂ©. Cependant, ces solutions ont adoptĂ© une approche non systĂ©matique. De plus, aucune solution pour rĂ©soudre des problĂšmes sĂ©mantiques spĂ©cifiques liĂ©s Ă lâinteropĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles nâa Ă©tĂ© trouvĂ©e. Dans cette thĂšse, nous supposons quâil est possible de dĂ©finir une approche qui traite ces problĂšmes sĂ©mantiques pour assurer lâinteropĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ainsi, nous dĂ©finissons tout dâabord lâinteropĂ©rabilitĂ© entre ces bases de donnĂ©es. Ensuite, nous dĂ©finissons et classifions les problĂšmes dâhĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique qui peuvent se produire au cours dâune telle interopĂ©rabilitĂ© de diffĂ©rentes bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Afin de rĂ©soudre ces problĂšmes dâhĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication sâĂ©tablit entre deux agents systĂšme reprĂ©sentant les bases de donnĂ©es gĂ©ospatiales multidimensionnelles impliquĂ©es dans un processus dâinteropĂ©rabilitĂ©. Cette communication vise Ă Ă©changer de lâinformation sur le contenu de ces bases. Ensuite, dans lâintention dâaider les agents Ă prendre des dĂ©cisions appropriĂ©es au cours du processus dâinteropĂ©rabilitĂ©, nous Ă©valuons un ensemble dâindicateurs de la qualitĂ© externe (fitness-for-use) des schĂ©mas et du contexte de production (ex., les mĂ©tadonnĂ©es). Finalement, nous mettons en Ćuvre lâapproche afin de montrer sa faisabilitĂ©.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organizationâs strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility
Constructing data marts from web sources using a graph common model
At a time when humans and devices are generating more information than ever, activities such as data mining and machine learning become crucial. These activities enable us to understand and interpret the information we have and predict, or better prepare ourselves for, future events. However, activities such as data mining cannot be performed without a layer of data management to clean, integrate, process and make available the necessary datasets. To that extent, large and costly data flow processes such as Extract-Transform-Load are necessary to extract from disparate information sources to generate ready-for-analyses datasets. These datasets are generally in the form of multi-dimensional cubes from which different data views can be extracted for the purpose of different analyses. The process of creating a multi-dimensional cube from integrated data sources is significant. In this research, we present a methodology to generate these cubes automatically or in some cases, close to automatic, requiring very little user interaction. A construct called a StarGraph acts as a canonical model for our system, to which imported data sources are transformed. An ontology-driven process controls the integration of StarGraph schemas and simple OLAP style functions generate the cubes or datasets. An extensive evaluation is carried out using a large number of agri data sources with user-defined case studies to identify sources for integration and the types of analyses required for the final data cubes
Method for Reusing and Re-engineering Non-ontological Resources for Building Ontologies
This thesis is focused on the reuse and possible subsequent re-engineering of knowledge resources, as opposed to custom-building new ontologies from scratch. The deep analysis of the state of the art has revealed that there are some methods and tools in the literature for transforming non-ontological resources into ontologies, but with some limitations: _ Most of the methods presented are based on ad-hoc transformations for the resource type, and the resource implementation. _ Only a few take advantage of the resource data model, an important artifact for the re-engineering process [GGPSFVT08]. _ There is no any integrated framework, method or corresponding tool, that considers the resources types, data models and implementations identified in an unified way. _ With regard to the transformation approach, the majority of the methods perform a TBox transformation, many others perform an ABox transformation and some perform a population. However, no method includes the possibility to perform the three transformation approaches. _ Regarding to the degree of automation, almost all the methods perform a semi-automatic transformation of the resource. _ According to the explicitation of the hidden semantics in the relations of the resource components, we can state that the methods that perform a TBox transformation make explicit the semantics in the relations of the resource components. Most of those methods identify subClassOf relations, others identify ad-hoc relations, and some identify partOf relations. However, only a few methods make explicit the three types of relations. _ With respect to how the methods make explicit the hidden semantics in the relations of the resource terms, we can say that three methods rely on the domain expert for making explicit the semantics, and two rely on an external resource, e.g., DOLCE ontology. Moreover, there are two methods that rely on external resources but not for making explicit the hidden semantics, but for finding out a proper ontology for populating it. _ According to the provision of the methodological guidelines, almost all the methods provide methodological guidelines for the transformation. However these guidelines are not finely detailed; for instance, they do not provide information about who is in charge of performing a particular activity/task, nor when that activity/task has to be carried out. _ With regard to the techniques employed, most of the methods do not mention them at all. Only a few methods specify techniques as transformation rules, lexico-syntactic patterns, mapping rules and natural language techniques. In this thesis we have provided a method and its technological support that rely on re-engineering patterns in order to speed up the ontology development process by reusing and re-engineering as much as possible available non-ontological resources. To achieve this overall goal, we have decomposed it in the following objectives: (1) the definition of methodological aspects related with the reuse of non-ontolo-gical resource for building ontologies; (2) the definition of methodological aspects related with the re-engineering of non-ontological resources for building ontologies; (3) the creation of a library of patterns for re-engineering nonontological resources into ontologies; and (4) the development of a software library that implements the suggestions given by the re-engineering patterns. Having in mind these goals, in this chapter we present how the open research problems identified in Chapter 2 are solved by the main thesis contributions. Then, we discuss the verification of our hypotheses, and finally we provide an outlook for the future work in those topics
Analyse OLAP d'un entrepĂŽt de documents XML
Les systĂšmes OLAP basĂ©s sur des entrepĂŽts de donnĂ©es sont aujourdâhui bien intĂ©grĂ©s dans les organisations, ils facilitent le traitement et lâanalyse de lâinformation pour la prise de dĂ©cision. Le dĂ©veloppement du Web a conduit Ă lâaccroissement du volume de donnĂ©es traitĂ©, ainsi quâĂ la diversification des sources de lâinformation. Ce problĂšme de diversification a Ă©tĂ© en partie rĂ©solu grĂące au langage XML. Celui-ci permet en effet le traitement et lâĂ©change de donnĂ©es complexes et hĂ©tĂ©rogĂšnes. Seulement câest un format qui sâadapte mal aux systĂšmes OLAP et dâentrepĂŽts classiques. De plus il nâexiste Ă ce jour aucun standard permettant de rĂ©pondre Ă cette problĂ©matique. Aussi nous avons dĂ©veloppĂ© un modĂšle multidimensionnel qui utilise le formalisme orientĂ© objet UML pour dĂ©crire un entrepĂŽt de documents XML orientĂ©s-document. Le schĂ©ma de cet entrepĂŽt (appelĂ© StarCD) reprĂ©sente la structure des documents Ă analyser, telle quâelle est connue par le dĂ©cideur. Et dans cet article nous prĂ©sentons un nouveau langage dâanalyse OLAP destinĂ© aux dĂ©cideurs, qui permet dâexprimer des requĂȘtes complexes sur un entrepĂŽt de documents XML dĂ©crit par un StarCD
- âŠ