303 research outputs found

    Bridging the Knowledge Gap between Transactional Databases and Data Warehouses

    Get PDF
    Normal 0 false false false EN-US X-NONE X-NONE Data warehouse is widely recognized in the industry as the principal decision support system architecture and an integral part of the corporate information system.  However, the majority of academic institutions in the US and world-wide have been slow in developing curriculums that reflect this reality.  This paper examines the issues that have contributed to the lag in the coverage of data warehousing topics at universities.</p

    Data Warehouse Design: An Investigation of Star Schema

    Get PDF

    Specifications of view services for GMES Core_003 VHR2 coverage

    Get PDF
    For the so-called DataWareHouse concept (DWH) within the GMES Initial Operations period 2011-2014, data access management is funded through a Delegation Agreement between the EC and ESA. The Core_003 VHR2 dataset is one of the satellite coverages that are defined as CORE datasets within the DWH with fixed specifications which will be of-fered to a broad range of users and activities. JRC was asked by DG Enterprise to provide technical specifications for the implementation of a view service for the Core_003 datasets as part of the Administrative Arrangement n. 5 between DG Enterprise and JRC. This report provides an overview about different view service types with their specific characteristics and use cases. Since compliance with INSPIRE implementing rules is a goal to be achieved by GMES services, the spe-cific requirements of INSPIRE for view services have been taken into account. The Core_003 datasets have been ana-lysed with regard to their parameters that are important for the inclusion in view services. Based on the results of the analyses, recommendations are given for the implementation of the view services as well as for the data processing and configuration of the Core_003 datasets.JRC.H.6-Digital Earth and Reference Dat

    Open archival information systems for database preservation

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Universidade do Porto. Faculdade de Engenharia. 201

    A framework for integrating DNA sequenced data

    Get PDF
    The Human Genome Project generated vast amounts of DNA sequenced data scattered in disparate data sources in a variety of formats. Integrating biological data and extracting information held in DNA sequences are major ongoing tasks for biologists and software professionals. This thesis explored issues of finding, extracting, merging and synthesizing information from multiple disparate data sources containing DNA sequenced data, which is composed of 3 billion chemical building blocks of bases. We proposed a biological data integration framework based on typical usage patterns to simplify these issues for biologists. The framework uses a relational database management system at the backend, and provides techniques to extract, store, and manage the data. This framework was implemented, evaluated, and compared with existing biological data integration solutions

    Data driven decision support systems as a critical success factor for IT-Governance: an application in the financial sector

    Get PDF
    IT-Governance has a major impact not only on IT management but also and foremost in the Enterprises performance and control. Business uses IT agility, flexibility and innovation to pursue its objectives and to sustain its strategy. However being it more critical to the business, compliance forces IT on the opposite way of predictability, stability and regulations. Adding the current economical environment and the fact that most of the times IT departments are considered cost centres, IT-Governance decisions become more important and critical. Current IT-Governance research and practise is mainly based on management techniques and principles, leaving a gap for the contribution of information systems to IT-Governance enhancement. This research intends to provide an answer to IT-Governance requirements using Data Driven Decision Support Systems based on dimensional models. This seems a key factor to improve the IT-Governance decision making process. To address this research opportunity we have considered IT-Governance research (Peter Weill), best practises (ITIL), Body of Knowledge (PMBOK) and frameworks (COBIT). Key IT-Governance processes (Change Management, Incident Management, Project Development and Service Desk Management) were studied and key process stakeholders were interviewed. Based on the facts gathered, dimensional models (data marts) were modelled and developed to answer to key improvement requirements on each IT-Governance process. A Unified Dimensional Model (IT-Governance Data warehouse) was materialized. To assess the Unified Dimensional Model, the model was applied in a bank in real working conditions. The resulting model implementation was them assessed against Peter Weill‘s Governance IT Principles.Assessment results revealed that the model satisfies all the IT-Governance Principles. The research project enables to conclude that the success of IT-Governance implementation may be fostered by Data Driven Decision Support Systems implemented using Unified Dimensional Model concepts and based on best practises, frameworks and body of knowledge that enable process oriented, data driven decision support

    Continuous Delivery in Data Warehousing

    Get PDF
    Tämän väitöskirjan motivaatio kumpuaa käytännön ongelmasta: kuinka lyhentää aikaa ideasta analysoida jotain siihen, että analyysi on käyttäjien saatavilla. Tietovarastointia on perinteisesti pidetty monimutkaisena ja siten herkkänä virheille. Tietovarastoinnissa erilliset vaiheet tapahtuvat peräkkäin ennalta määritellyssä järjestyksessä. Perinteinen tapa tietovarastoinnissa on ottaa koko ratkaisu kerralla tuotantokäyttöön, jossa kaikki tietovaraston palaset ovat paikoillaan ennen tuotantokäyttöä. Mikäli kehitys seuraa lyhyitä iteraatioita, miksi käyttöönotot tuotantoon eivät seuraa näitä iteraatioita? Tämä väitöskirja esittelee kuinka raportointi- ja tietovarastointitiimit voivat rakentaa yhtäaikaa raportointiratkaisuja (business intelligence) vaiheittain. Yhteistyö tehostaa kehittäjien välistä kommunikaatiota ja lyhentää palautesykliä loppukäyttäjältä kehittäjille mikä tekee palautteesta suorempaa. Jatkuvan käyttöönoton käytännöt tukevat julkaisemista usein tuotantoympäristöön. Kaksikerroksinen tietovarastoarkkitehtuuri erottaa analyyttisen ja tapahtumapohjaisen käsittelyn. Erilaisten käsittelyjen erottaminen mahdollistaa paremman testauksen ja siten jatkuvan käyttöönoton. Käytettäessä jatkuvaa käyttöönottoa, voidaan kehitysaikaa lyhentää myös automatisoimalla tietomuunnosten toteutustyötä. Tämä väitöskirja esittelee tietomallin tietomuunnosten automatisoinnin toteuttamista varten, niin tiedon saattamiseksi tietovarastoon kuin tiedon hyödyntämiseen tietovarastosta. Tutkimuksen arvioinnissa noudatettiin suunnittelutieteen suuntaviivoja. Tutkimus tehtiin yhteistyössä teollisuuden ja yliopistojen kanssa. Näitä ideoita on testattu todellisissa projekteissa lupaavin tuloksin ja siten ne on todistettu toimiviksi.Continuous delivery is an activity in the field of continuous software engineering. Data warehousing, on the other hand, lie within information systems research. This dissertation combines these two traditionally separate concerns of continuous delivery and data warehousing. This dissertation’s motivation stems from a practical problem: how to shorten the time from a reporting idea until it is available for users. Data warehousing has traditionally been considered tedious and delicate. In data warehousing, distinct steps take place one after another in a predefined unalterable sequence. Another traditional aspect of data warehousing is bringing everything at once to a production environment, where all the pieces of a data warehouse are in place before production use. If development follows agile iterations, why are the releases in production not following the same iterations? This dissertation introduces how reporting and data warehouse teams can synchronously build business intelligence solutions in increments. Joint working enhances communication between developers and shortens the feedback cycle from an end-user to developers, and makes the feedback more direct. Continuous delivery practices support releasing frequently to a production environment. A two-layer data warehouse architecture separates analytical and transactional processing. Separating different processing targets enables better testing and, thus, continuous delivery. When frequently deploying with continuous delivery practices, automating transformation creation in data warehousing reduces the development time. This dissertation introduces an information model for automating the implementation of transformations, getting data into a data warehouse and getting data out of it. The research evaluation followed the design science guidelines. Research for this dissertation collaborated with the industry. These ideas have been tested on real projects with promising results, and thus they have been proven to work

    LEAN DATA ENGINEERING. COMBINING STATE OF THE ART PRINCIPLES TO PROCESS DATA EFFICIENTLYS

    Get PDF
    The present work was developed during an internship, under Erasmus+ Traineeship program, in Fieldwork Robotics, a Cambridge based company that develops robots to operate in agricultural fields. They collect data from commercial greenhouses with sensors and real sense cameras, as well as with gripper cameras placed in the robotic arms. This data is recorded mainly in bag files, consisting of unstructured data, such as images and semi-structured data, such as metadata associated with both the conditions where the images were taken and information about the robot itself. Data was uploaded, extracted, cleaned and labelled manually before being used to train Artificial Intelligence (AI) algorithms to identify raspberries during the harvesting process. The amount of available data quickly escalates with every trip to the fields, which creates an ever-growing need for an automated process. This problem was addressed via the creation of a data engineering platform encom- passing a data lake, data warehouse and its needed processing capabilities. This platform was created following a series of principles entitled Lean Data Engineering Principles (LDEP), and the systems that follows them are called Lean Data Engineering Systems (LDES). These principles urge to start with the end in mind: process incoming batch or real-time data with no resource wasting, limiting the costs to the absolutely necessary for the job completion, in other words to be as lean as possible. The LDEP principles are a combination of state-of-the-art ideas stemming from several fields, such as data engineering, software engineering and DevOps, leveraging cloud technologies at its core. The proposed custom-made solution enabled the company to scale its data operations, being able to label images almost ten times faster while reducing over 99.9% of its associated costs in comparison to the previous process. In addition, the data lifecycle time has been reduced from weeks to hours while maintaining coherent data quality results, being able, for instance, to correctly identify 94% of the labels in comparison to a human counterpart.Este trabalho foi desenvolvido durante um estágio no âmbito do programa Erasmus+ Traineeship, na Fieldwork Robotics, uma empresa sediada em Cambridge que desenvolve robôs agrícolas. Estes robôs recolhem dados no terreno com sensores e câmeras real- sense, localizados na estrutura de alumínio e nos pulsos dos braços robóticos. Os dados recolhidos são ficheiros contendo dados não estruturados, tais como imagens, e dados semi- -estruturados, associados às condições em que as imagens foram recolhidas. Originalmente, o processo de tratamento dos dados recolhidos (upload, extração, limpeza e etiquetagem) era feito de forma manual, sendo depois utilizados para treinar algoritmos de Inteligência Artificial (IA) para identificar framboesas durante o processo de colheita. Como a quantidade de dados aumentava substancialmente com cada ida ao terreno, verificou-se uma necessidade crescente de um processo automatizado. Este problema foi endereçado com a criação de uma plataforma de engenharia de dados, composta por um data lake, uma data warehouse e o respetivo processamento, para movimentar os dados nas diferentes etapas do processo. Esta plataforma foi criada seguindo uma série de princípios intitulados Lean Data Engineering Principles (LDEP), sendo os sistemas que os seguem intitulados de Lean Data Engineering Systems (LDES). Estes princípios incitam a começar com o fim em mente: processar dados em batch ou em tempo real, sem desperdício de recursos, limitando os custos ao absolutamente necessário para a concluir o trabalho, ou seja, tornando-os o mais lean possível. Os LDEP combinam vertentes do estado da arte em diversas áreas, tais como engenharia de dados, engenharia de software, DevOps, tendo no seu cerne as tecnologias na cloud. O novo processo permitiu à empresa escalar as suas operações de dados, tornando-se capaz de etiquetar imagens quase 10× mais rápido e reduzindo em mais de 99,9% os custos associados, quando comparado com o processo anterior. Adicionalmente, o ciclo de vida dos dados foi reduzido de semanas para horas, mantendo uma qualidade equiparável, ao ser capaz de identificar corretamente 94% das etiquetas em comparação com um homólogo humano

    An elastic software architecture for extreme-scale big data analytics

    Get PDF
    This chapter describes a software architecture for processing big-data analytics considering the complete compute continuum, from the edge to the cloud. The new generation of smart systems requires processing a vast amount of diverse information from distributed data sources. The software architecture presented in this chapter addresses two main challenges. On the one hand, a new elasticity concept enables smart systems to satisfy the performance requirements of extreme-scale analytics workloads. By extending the elasticity concept (known at cloud side) across the compute continuum in a fog computing environment, combined with the usage of advanced heterogeneous hardware architectures at the edge side, the capabilities of the extreme-scale analytics can significantly increase, integrating both responsive data-in-motion and latent data-at-rest analytics into a single solution. On the other hand, the software architecture also focuses on the fulfilment of the non-functional properties inherited from smart systems, such as real-time, energy-efficiency, communication quality and security, that are of paramount importance for many application domains such as smart cities, smart mobility and smart manufacturing.The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the ELASTIC Project (www.elastic-project.eu), grant agreement No 825473.Peer ReviewedPostprint (published version
    corecore