11 research outputs found

    Data generator for evaluating ETL process quality

    Get PDF
    Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach.Peer ReviewedPostprint (author's final draft

    Typification of Incorrect Event Data in Supply Chain Event Management

    Get PDF
    Due to shorter product life cycles and the increasing internationalization of competition, companies are confronted with increasing complexity in supply chain management. Event-based systems are used to reduce this complexity and to support employees' decisions. Such event-based systems include tracking & tracing systems on the one hand and supply chain event management on the other. Tracking & tracing systems only have the functions of monitoring and reporting deviations, whereas supply chain event management systems also function as simulation, control, and measurement. The central element connecting these systems is the event. It forms the information basis for mapping and matching the process sequences in the event-based systems. The events received from the supply chain partner form the basis for all downstream steps and must, therefore, contain the correct data. Since the data quality is insufficient in numerous use cases and incorrect data in supply chain event management is not considered in the literature, this paper deals with the description and typification of incorrect event data. Based on a systematic literature review, typical sources of errors in the acquisition and transmission of event data are discussed. The results are then applied to event data so that a typification of incorrect event types is possible. The results help to significantly improve event-based systems for use in practice by preventing incorrect reactions through the detection of incorrect event data

    Feature Influence Based ETL for Efficient Big Data Management

    Get PDF
    The increased volume of big data introduces various challenges for its maintenance and analysis. There exist various approaches to the problem, but they fail to achieve the expected results. To improve the big data management performance, an efficient real time feature influence analysis based Extraction, Transform, and Loading (ETL) framework is presented in this article. The model fetches the big data and analyses the features to find noisy records by preprocessing the data set. Further, the method performs feature extraction and applies feature influence analysis to various data nodes and the data present in the data nodes. The method estimates Feature Specific Informative Influence (FSII) and Feature Specific Supportive Influence (FSSI). The value of FSII and FSSI are measured with the support of a data dictionary. The class ontology belongs to various classes of data. The value of FSII is measured according to the presence of a concrete feature on a tuple towards any data node, whereas the value of FSSI is measured based on the appearance of supportive features on any data point towards the data node. Using these measures, the method computes the Node Centric Transformation Score (NCTS). Based on the value of NCTS the method performs map reduction and merging of data nodes. The NCTS_FIA method achieves higher performance in the ETL process. By adapting feature influence analysis in big data management, the ETL performance is improved with the least amount of time complexity

    Business intelligence -järjestelmän onnistuminen

    Get PDF
    Business intelligence järjestelmissä on tapahtunut nopeaa kehitystä. Tiedon määrä ja merkitys kasvavat yrityksissä ja nykyaikainen tekniikka mahdollistaa suurien tietomäärien reaaliaikaisen laskennan. Tässä Pro gradu -tutkielmassa tutkitaan business intelligence -järjestelmän valintaa ja onnistumista valmistavan teollisuuden yrityksessä. Tutkimuksessa kuvataan toimintaympäristö case-yrityksessä, markkinakatsaus mahdollisista teknologioista ja järjestelmän valinnan perusteet. Järjestelmän onnistumista arvioitiin DeLonen ja McLeanin tietojärjestelmän onnistumisen mallin mukaan tiedon laadun, järjestelmän laadun, palvelun laadun, käytön, käyttäjätyytyväisyyden ja nettohyödyn mittareilla. Tutkimusmetodeina käytettiin havainnointia ja käyttäjille kohdistettua kyselyä. Järjestelmäksi valittiin Qlikview sen käytettävyyden vuoksi. Kehityskohteista huolimatta järjestelmä on määriteltyjen mittareiden mukaan onnistunut case-yrityksessä

    Solución business intelligence para mejorar la toma de decisiones en la sección de producción de una empresa de hidrocarburos

    Get PDF
    Mi tesis tuvo como título “Solución Business Intelligence para mejorar la toma de decisiones en la sección de Producción de una empresa de Hidrocarburos” fue desarrollada en la empresa CNPC Perú S.A., se aplicó una Metodología de tipo aplicada, con enfoque cuantitativo, para poder medir la variable dependiente, el cual fue un diseño experimental, y como tipo pre experimental, la metodología permitió definir el objetivo general para mejorar la toma de decisiones, la recolección de los datos fue con fichas de observación para el análisis descriptivo e inferencial del pretest y postest, con pruebas de normalidad y contrastación de hipótesis, como resultados se obtuvo que el tiempo promedio de la toma de decisiones disminuyó 32,2627 minutos, equivalente al 50.73%, para el costo promedio se obtuvo una reducción de 173791,2000 dólares, los cuales equivalen al 63.23%, para el desempeño promedio de los colaboradores se nota un aumento del 9,5334 por ciento, los cuales equivalen al 11.25%, en la disponibilidad promedio de la información se observa una reducción de 85,6667 minutos, los cuales equivalen al 57.86%. Se pudo concluir que la solución de Business Intelligence mejoró la toma de decisiones en la sección de producción de la empresa CNPC Perú S.A

    Automating User-Centered Design of Data-Intensive Processes

    Get PDF
    Business Intelligence (BI) enables organizations to collect and analyze internal and external business data to generate knowledge and business value, and provide decision support at the strategic, tactical, and operational levels. The consolidation of data coming from many sources as a result of managerial and operational business processes, usually referred to as Extract-Transform-Load (ETL) is itself a statically defined process and knowledge workers have little to no control over the characteristics of the presentable data to which they have access. There are two main reasons that dictate the reassessment of this stiff approach in context of modern business environments. The first reason is that the service-oriented nature of today’s business combined with the increasing volume of available data make it impossible for an organization to proactively design efficient data management processes. The second reason is that enterprises can benefit significantly from analyzing the behavior of their business processes fostering their optimization. Hence, we took a first step towards quality-aware ETL process design automation by defining through a systematic literature review a set of ETL process quality characteristics and the relationships between them, as well as by providing quantitative measures for each characteristic. Subsequently, we produced a model that represents ETL process quality characteristics and the dependencies among them and we showcased through the application of a Goal Model with quantitative components (i.e., indicators) how our model can provide the basis for subsequent analysis to reason and make informed ETL design decisions. In addition, we introduced our holistic view for a quality-aware design of ETL processes by presenting a framework for user-centered declarative ETL. This included the definition of an architecture and methodology for the rapid, incremental, qualitative improvement of ETL process models, promoting automation and reducing complexity, as well as a clear separation of business users and IT roles where each user is presented with appropriate views and assigned with fitting tasks. In this direction, we built a tool —POIESIS— which facilitates incremental, quantitative improvement of ETL process models with users being the key participants through well-defined collaborative interfaces. When it comes to evaluating different quality characteristics of the ETL process design, we proposed an automated data generation framework for evaluating ETL processes (i.e., Bijoux). To this end, we classified the operations based on the part of input data they access for processing, which facilitated Bijoux during data generation processes both for identifying the constraints that specific operation semantics imply over input data, as well as for deciding at which level the data should be generated (e.g., single field, single tuple, complete dataset). Bijoux offers data generation capabilities in a modular and configurable manner, which can be used to evaluate the quality of different parts of an ETL process. Moreover, we introduced a methodology that can apply to concrete contexts, building a repository of patterns and rules. This generated knowledge base can be used during the design and maintenance phases of ETL processes, automatically exposing understandable conceptual representations of the processes and providing useful insight for design decisions. Collectively, these contributions have raised the level of abstraction of ETL process components, revealing their quality characteristics in a granular level and allowing for evaluation and automated (re-)design, taking under consideration business users’ quality goals

    Proposta de um modelo de business intelligence para o apoio à decisão através da perspectiva da Data Science.

    Get PDF
    A tomada de decisões pelos gestores é um assunto recorrente nas organizações e empresas nos dias atuais, sendo estas privadas ou de gestão pública. Esta última mantém um sistema burocrático, com difícil acesso a informações rápidas e precisas. Com a quantidade massiva de dados disponível dentro das organizações, e também provindos de fontes externas, estas têm buscado novas tecnologias e métodos em meio aos Sistemas de Informação para obtenção de informações com mais qualidade. Os sistemas de Business Intelligence (BI) são um dos meios que contribuem para reunião, análise e propagação de dados, resultando em diversos produtos e relatórios administrativos que facilitam a tomada de decisões; e a Data Science (DS), ou ciência dos dados, a qual é um campo emergente em meio aos Sistemas de Informação, carrega as características de transformação e análise de dados de forma que ajude a organização, também, no processo decisório. Ambos os conceitos possuem seus métodos, processos, modelos e ciclos de vida para chegarem em um objetivo em comum. Entretanto, na literatura, há uma carência de modelos que agreguem os dois conceitos de forma concomitante, ou aplicando conceitos de um campo no outro. A partir desse ponto de defasagem, o objetivo dessa pesquisa é propor um modelo que aplique os conceitos de BI e DS, colocando os conceitos de cada um em conjunto, conceitualizando-os e identificando os pontos em que são convergentes e divergentes para se ter um modelo eficiente. A metodologia utilizada é a de Modelagem, para desenvolver o modelo proposto, o qual passa por sua Conceitualização, Modelagem, e Solução e Implementação, onde os conceitos e passos dos processos de BI e DS são explorados e colocados conjuntamente, com seus ciclos e fases. Por fim, o modelo desenvolvido foi aplicado em uma ferramenta computacional que possa incorporá-lo, como forma de testá-lo e validá-lo, gerando produtos computacionais para serem utilizados. O resultado foi aplicado na Universidade Federal de Itajubá, mais especificamente no setor de contabilidade e finanças, auxiliando os gestores na tomada de decisões e também por fins de transparência, expondo os relatórios provindos do modelo. Os resultados que surgem do modelo desenvolvido são dashboards e produtos de visualização de dados que são disponibilizados no sítio eletrônico da Universidade, ao mesmo tempo que providenciam aos servidores e à gerência uma fonte de informações rápida e eficiente, validando o modelo criado. Pode-se assim concluir que os conceitos são aplicáveis, inclusive na gestão pública, para gerarem modelos auxiliares a tomada de decisão, e que também, podem ser desenvolvidos em outros setores e organizações

    Utilização do business intelligence para apoio em tomada de decisões em farmácia pública no Sul de Minas Gerais

    Get PDF
    An efficient management of the pharmaceutical assistance service contributes to the promotion of the rational use of medicines, bringing countless benefits to users of the Unified Health System. In pursuit of this efficiency, it is necessary to access reliable information in a timely manner, so that decision-making takes place based on knowledge, with speed and quality. The purpose of this research was the use of Business Intelligence, through one of its analysis tool, OLAP, to enable the contextualization of data and the presentation of cause and effect relationships through the development of a computational tool, allowing the improvement of operational, tactical and strategic decision-making in the pharmaceutical assistance service in a municipality in the south of Minas Gerais. The action research methodology was used with data collected from documental research instruments, interview, questionnaire and observation. The analyzes performed were related to the selected health units, the medication under study dispensed, the profile of the patients and the main prescribers. Finally, an action plan was carried out for each analysis performed and a dashboard was created, with the objective of facilitating the generation of knowledge in an intuitive way. As a result, a tool was obtained, which can be replicated to other lists of drugs, areas and periods to provide cause-and-effect relationships, allowing the targeting of actions to groups of patients, professionals and population areas in order to maintain regularity supply of medicines and promote their rational use.Uma eficiente gestão do serviço de assistência farmacêutica contribui para a promoção do uso racional de medicamentos, trazendo inúmeros benefícios aos usuários do Sistema Único de Saúde. Em busca desta eficiência, faz-se necessário o acesso a informações fidedignas e em tempo hábil, para que a tomada de decisão aconteça pautada no conhecimento, com rapidez e qualidade. A proposta desta pesquisa foi a utilização do Business Intelligence, por intermédio de uma de sua ferramenta de análise, o OLAP, para possibilitar a contextualização de dados e a apresentação de relações de causa e efeito por meio do desenvolvimento de uma ferramenta computacional, permitindo a melhoria da tomada de decisões operacionais, táticas e estratégicas no serviço de assistência farmacêutica de um município do sul de Minas Gerais. Foi utilizada a metodologia de pesquisa-ação com dados coletados a partir de instrumentos de pesquisa documental, entrevista, questionário e observação. As análises realizadas foram relativas às unidades de saúde selecionadas, aos medicamento em estudo dispensados, ao perfil dos pacientes e dos maiores prescritores. Por fim, foi feito um planejamento de ações para cada análise realizada e elaborado um dashboard, com o objetivo de facilitar a geração de conhecimento de forma intuitiva. Como resultado, obteve-se uma ferramenta, que pode ser replicada para outros elencos de medicamentos, áreas e períodos para disponibilizar relações de causa e efeito permitindo o direcionamento de ações a grupos de pacientes, profissionais e área populacional com o intuito de manter a regularidade do abastecimento de medicamentos e promover o seu uso racional

    Data generator for evaluating ETL process quality

    No full text
    Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach.Peer Reviewe
    corecore