Search CORE

10 research outputs found

Using Ontologies for the Design of Data Warehouses

Author: Mazón Jose-Norberto
Pardillo Jesús
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/01/2011
Field of study

Obtaining an implementation of a data warehouse is a complex task that forces designers to acquire wide knowledge of the domain, thus requiring a high level of expertise and becoming it a prone-to-fail task. Based on our experience, we have detected a set of situations we have faced up with in real-world projects in which we believe that the use of ontologies will improve several aspects of the design of data warehouses. The aim of this article is to describe several shortcomings of current data warehouse design approaches and discuss the benefit of using ontologies to overcome them. This work is a starting point for discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure

arXiv.org e-Print Archive

Repositorio Institucional de la Universidad de Alicante

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

A requirements engineering approach based on the alignment of data warehouses and business strategy

Author: Cravero Leal Ania
Mazón Jose-Norberto
Sepúlveda Samuel
Trujillo Juan
Publication venue: 'SciELO Agencia Nacional de Investigacion y Desarrollo (ANID)'
Publication date: 01/01/2013
Field of study

Garantizar que los almacenes de datos estén alineados a la estrategia del negocio es primordial para su éxito, ya que estos son utilizados por los gerentes del negocio con el fin de analizar los datos estratégicos de la organización. En este trabajo presentamos un enfoque de ingeniería de requerimientos orientado al negocio que alinea el Almacén de Datos a su plan estratégico. El proceso se describe mediante un conjunto de directrices que incluyen: el análisis VMOST para obtener los objetivos desde los usuarios, el modelo BMM para comprobar que los objetivos definidos estén alineados con la estrategia, el modelado de objetivos por medio de i* con el fin de obtener los requerimientos de información del Almacén de Datos, y el modelado multidimensional mediante un perfil UML. Se presenta un estudio de caso para mostrar el proceso completo.Ensuring that data warehouses are aligned to the business strategy is critical for their success, as these are used by business managers to analyze the organization's strategic data. An approach to requirements engineering-oriented business that aligns the data store to its strategic plan is presented. The process is described by a set of guidelines that include: VMOST analysis to obtain the objectives from the users, the BMM model to verify that the defined objectives are aligned with the strategy, modeling objectives through i* to obtain the information requirements of the data warehouse and multidimensional modeling by UML profile. A case study to show the entire process is presented.Este trabajo se ha realizado con el apoyo de la Dirección de Investigación, Vicerrectoría de Investigación y Postgrado de la Universidad de La Frontera, a través del Proyecto de Investigación DIUFRO DI11-0044

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Автоматизация формирования табличных приложений

Author: S. Zykin V.
Сергей Зыкин Владимирович
Publication venue: 'P.G. Demidov Yaroslavl State University'
Publication date: 20/08/2013
Field of study

The paper considers automation problems of the interface formation between a table and a relational database. The task description is formalized and the description of the existing approaches to formation of data representations on an example of widely widespread CASE-tools is submitted. The definition of intermediate data representation as a ”join table” is offered, which is used for maintenance of correctness of data representation formation, and also is necessary for direct and inverse data transformations. On the basis of lossless join property and realized dependencies, the concept and a way of context formation of the application and restrictions is introduced. The considered material is further used for constructing an inverse data transformation from tabular presentation into a relational one. On the basis of relationships properties on a database scheme, the partial order on the relations is established, and the restriction of acyclic databases schemes is introduced. The received results are further used at the analysis of principles of formation of inverse data transformation, and the basic details of such a transformation algorithm are considered.Рассмотрены проблемы автоматизации формирования интерфейса между таблицей и реляционной базой данных. Формализована постановка задачи и представлено описание существующих подходов к формированию представлений данных на примере широко распространенных CASE-инструментов. Предложено определение промежуточного представления данных в виде таблицы соединений, которая используется для обеспечения корректности формирования представления данных и также необходима для прямого и обратного преобразования данных. На основе свойства соединения без потери информации и реализованных зависимостей вводится понятие и способ формирования контекстов приложения и ограничений на данные. Рассмотренный материал далее используется для построения обратного преобразования данных из табличного представления в реляционное. На основе использования свойств связей на схеме базы данных устанавливается частичный порядок над отношениями, вводится ограничение ацикличности на допустимые схемы баз данных. Полученные результаты далее используются при анализе принципов формирования обратного преобразования данных, рассмотрена схема алгоритма такого преобразования

Modeling and Analysis of Information Systems / Моделирование и анализ информационных систем (МАИС)

INCORPORACIÓN DE ELEMENTOS DE INTELIGENCIA DE NEGOCIOS EN EL PROCESO DE ADMISIÓN Y MATRÍCULA DE UNA UNIVERSIDAD CHILENA

Author: Luis Fuentes Tapia
Ricardo Valdivia Pinto
Publication venue: 'SciELO Agencia Nacional de Investigacion y Desarrollo (ANID)'
Publication date
Field of study

Crossref

Enrichment of the Phenotypic and Genotypic Data Warehouse analysis using Question Answering systems to facilitate the decision making process in cereal breeding programs

Author: Ferrández Antonio
Ferrández Luis José
Gregorio Medrano Elisa de
Maté Alejandro
Peral Jesús
Trujillo Juan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.This paper has been partially supported by the MESOLAP (TIN2010-14860) and GEODAS-BI (TIN2012-37493-C03-03) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses

Author: Mazón Jose Norberto
Muñoz Lilia
Trujillo Juan
Publication venue
Publication date: 11/01/2010
Field of study

In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures.In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures

Repositorio Institucional de la Universidad Tecnológica de Panamá

Development of new data partitioning and allocation algorithms for query optimization of distributed data warehouse systems

Author: Abdalla Hassan Ismail
Publication venue
Publication date: 01/06/2008
Field of study

Distributed databases and in particular distributed data warehousing are becoming an increasingly important technology for information integration and data analysis. Data Warehouse (DW) systems are used by decision makers for performance measurement and decision support. However, although data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, the OLAP query response time is strongly affected by the volume of data need to be accessed from storage disks. Data partitioning is one of the physical design techniques that may be used to optimize query processing cost in DWs. It is a non redundant optimization technique because it does not replicate data, contrary to redundant techniques like materialized views and indexes. The warehouse partitioning problem is concerned with determining the set of dimension tables to be partitioned and using them to generate the fact table fragments. In this work an enhanced grouping algorithm that avoids the limitations of some existing vertical partitioning algorithms is proposed. Furthermore, a static partitioning algorithm that allows fragmentation at early stages of schema design is presented. The thesis also, investigates the performance of the data warehouse after implementing a combination of Genetic Algorithm (GA) and Simulated Annealing (SA) techniques to horizontally partition the data warehouse star schema. It, then presents the experimentation and implementation results of the proposed algorithm. This research presented different approaches to optimize data fragments allocation cost using a greedy mathematical model and a combination of simulated annealing and genetic algorithm to determine the site by site allocation leading to optimal solutions for fragments distribution. Throughout this thesis, the term fragmentation and partitioning will be used interchangeably

London Met Repository

Automating the multidimensional design of data warehouses

Author: Romero Moral Óscar
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2010
Field of study

Les experiències prèvies en l'àmbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament híbrid; això és, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procés de disseny.Com a qualsevol altre sistema, els requeriments són necessaris per garantir que el sistema desenvolupat satisfà les necessitats de l'usuari. A més, essent aquest un procés de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot ésser poblat amb dades de l'organització, i, a més, (ii) descobrir capacitats d'anàlisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procés de modelatge del magatzem de dades. No obstant això, les propostes basades en un anàlisi dels requeriments assumeixen que aquestos són exhaustius, i no consideren que pot haver-hi informació rellevant amagada a les fonts de dades. Contràriament, les propostes basades en un anàlisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqüentment, generen massa resultats. En aquest escenari, l'automatització del disseny del magatzem de dades és essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar únicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A més, l'automatització de la tasca allibera al dissenyador del sempre complex i costós anàlisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anàlisi dels requeriments no consideren l'automatització del procés, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situació es dona en els mètodes híbrids actual, que proposen un enfocament seqüencial, on l'anàlisi de les dades es complementa amb l'anàlisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a més, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clàssic, en el que els requeriments d'usuari són coneguts d'avantmà. Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procés des dels requeriments i, conseqüentment, és capaç de treballar sobre fonts de dades semànticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semànticament riques. Per aquest motiu, dirigeix el procés de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semànticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmà.Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament és més adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments són ben coneguts d'avantmà i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situació és quan l'usuari no té clares les capacitats d'anàlisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmà esmorteeix la necessitat de disposar de fonts de dades semànticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no són necessaris d'avantmà. Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades.Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literaturePostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Recommended from our members

Developing a data quality scorecard that measures data quality in a data warehouse

Author: Grillo Aderibigbe
Publication venue: Brunel University London
Publication date: 01/01/2018
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe main purpose of this thesis is to develop a data quality scorecard (DQS) that aligns the data quality needs of the Data warehouse stakeholder group with selected data quality dimensions. To comprehend the research domain, a general and systematic literature review (SLR) was carried out, after which the research scope was established. Using Design Science Research (DSR) as the methodology to structure the research, three iterations were carried out to achieve the research aim highlighted in this thesis. In the first iteration, as DSR was used as a paradigm, the artefact was build from the results of the general and systematic literature review conduct. A data quality scorecard (DQS) was conceptualised. The result of the SLR and the recommendations for designing an effective scorecard provided the input for the development of the DQS. Using a System Usability Scale (SUS), to validate the usability of the DQS, the results of the first iteration suggest that the DW stakeholders found the DQS useful. The second iteration was conducted to further evaluate the DQS through a run through in the FMCG domain and then conducting a semi-structured interview. The thematic analysis of the semi-structured interviews demonstrated that the stakeholder's participants‘ found the DQS to be transparent; an additional reporting tool; Integrates; easy to use; consistent; and increases confidence in the data. However, the timeliness data dimension was found to be redundant, necessitating a modification to the DQS. The third iteration was conducted with similar steps as the second iteration but with the modified DQS in the oil and gas domain. The results from the third iteration suggest that DQS is a useful tool that is easy to use on a daily basis. The research contributes to theory by demonstrating a novel approach to DQS design This was achieved by ensuring the design of the DQS aligns with the data quality concern areas of the DW stakeholders and the data quality dimensions. Further, this research lay a good foundation for the future by establishing a DQS model that can be used as a base for further development

Brunel University Research Archive