14 research outputs found

    A BPMN-Based Design and Maintenance Framework for ETL Processes

    Get PDF
    Business Intelligence (BI) applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes (known as ETL) is an inherently complex problem that is typically costly and time consuming. In a previous work, we have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: (i) how to implement the designed processes in an executable language, and (ii) how to maintain the implementation when the organization data infrastructure evolves. In this paper, we propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. We present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice

    The Specification of ETL Transformation Operations based on Weaving Models

    Get PDF
    In the ETL process the transformation of data is achieved through the execution of a set of transformation operations. The realization of this process (the order in which the transformation operations must be executed) should be preceded by a specification of the transformation process at a higher level of abstraction. The specification is given through mappings representing abstract operations specific to the transformation process. These mappings are defined through weaving models and metamodels. A generated weaving metamodel (GWMM) is proposed giving the complete mapping semantics through specific link types (representing the abstract operations) and appropriate OCL constraints. Weaving models specifying the actual mappings must be in accordance with this proposed GWMM

    Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

    Full text link
    Abstract Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.http://deepblue.lib.umich.edu/bitstream/2027.42/134522/1/13742_2016_Article_117.pd

    Revisión sistemática de la integración de modelos de desarrollo de software dirigido por modelos y metodologías ágiles

    Get PDF
    Currently, in some instances of the software development industry are carried out by means of manual activities and/or robust methodologies which can be often heavy and inefficient. This situation brings several issues related to the difficulty to produce software in a timely manner, agile, at low cost and with a high quality level. A way to improve this situation is to incorporate in the software development process the formalism and abstraction needed to automate and optimize the most critical tasks defined from methodologies used in software companies and starting from an agile approach. This would add value to the business and would improve significantly the process of software. In this sense, in order to publicize the benefits of agile approaches and programming environments driven models, a systematic review of the literature has been conducted so as to the projects where these approaches have been integrated globally. Besides, it has been possible to identify some benefits, which have been reported by different studies.Actualmente, en algunas instancias, la industria de desarrollo de software se lleva a cabo por medio de actividades manuales y/o metodologías robustas que pueden llegar a ser en muchos casos pesadas e ineficientes. Esta situación trae consigo algunos problemas relacionados con la dificultad para producir software de manera oportuna, ágil, a bajo costo y con un alto nivel de calidad. Una manera de mejorar esta situación está en añadir al proceso de desarrollo de software el formalismo y la abstracción necesaria que permita automatizar y optimizar las tareas más críticas definidas, a partir de las metodologías utilizadas en las empresas de software, y desde una perspectiva ágil. Esto añadiría valor agregado a los negocios y mejoraría el proceso de software considerablemente. En este sentido, con el objetivo de conocer las bondades de los enfoques ágiles y los entornos de programación dirigidos por modelos, se llevó a cabo una revisión sistemática de la literatura en relación con los proyectos donde se integran estos enfoques a nivel mundial, así como la identificació

    DSS from an RE perspective: A systematic mapping

    Get PDF
    Decision support systems (DSS) provide a unified analytical view of business data to better support decision-making processes. Such systems have shown a high level of user satisfaction and return on investment. However, several surveys stress the high failure rate of DSS projects. This problem results from setting the wrong requirements by approaching DSS in the same way as operational systems, whereas a specific approach is needed. Although this is well-known, there is still a surprising gap on how to address requirements engineering (RE) for DSS.; To overcome this problem, we conducted a systematic mapping study to identify and classify the literature on DSS from an RE perspective. Twenty-seven primary studies that addressed the main stages of RE were selected, mapped, and classified into 39 models, 27 techniques, and 54 items of guidance. We have also identified a gap in the literature on how to design the DSS main constructs (typically, the data warehouse and data flows) in a methodological manner from the business needs. We believe this study will help practitioners better address the RE stages of DSS projects.Peer ReviewedPostprint (author's final draft

    TPVS: treasury product valoration system

    Get PDF
    En este documento encontrará todo el planteamiento y desarrollo de TPVS. Un sistema que surge a partir de una necesidad específica propuesta por la empresa Management Solutions Colombia, en el que se pretende actualizar y automatizar el proceso de análisis de datos, el cual tienen muy bien definido a nivel empresarial, más no a nivel tecnológico. Esto se debe principalmente a que el área de tesorería no cuenta con soluciones de software suficientes que se adapten a la necesidad propias de la empresa. Es por eso que se ha decidido implementar un sistema que cumpla con los requerimientos del usuario, pero que, a su vez, sea la base de un software que pueda ser explorado, construido y ampliado a la medida, por aquellos que quieran hacer uso de este proyecto.In this document. We present the TPVS Project, its creation and development. It is a system that emerges from specific needs from the company Management Solutions Colombia, which main idea is update and automate the data analysis process, which is very well defined at the enterprise level, but not at the technological. This is mainly because the treasury area doesn't have enough software solutions that adapt their functionality to the proposed need. That is why a solution that takes into account specific user requirements has been implemented, but at the same time it is the basis of a software that can be explored, built and expanded taking into consideration the needs of the people who can use this project.Ingeniero (a) de SistemasPregrad

    I2ECR: Integrated and Intelligent Environment for Clinical Research

    Get PDF
    Clinical trials are designed to produce new knowledge about a certain disease, drug or treatment. During these studies, a huge amount of data is collected about participants, therapies, clinical procedures, outcomes, adverse events and so on. A multicenter, randomized, phase III clinical trial in Hematology enrolls up to hundreds of subjects and evaluates post-treatment outcomes on stratified sub- groups of subjects for a period of many years. Therefore, data collection in clinical trials is becoming complex, with huge amount of clinical and biological variables. Outside the medical field, data warehouses (DWs) are widely employed. A Data Ware-house is a “collection of integrated, subject-oriented databases designed to support the decision-making process”. To verify whether DWs might be useful for data quality and association analysis, a team of biomedical engineers, clinicians, biologists and statisticians developed the “I2ECR” project. I2ECR is an Integrated and Intelligent Environment for Clinical Research where clinical and omics data stand together for clinical use (reporting) and for generation of new clinical knowledge. I2ECR has been built from the “MCL0208” phase III, prospective, clinical trial, sponsored by the Fondazione Italiana Linfomi (FIL); this is actually a translational study, accounting for many clinical data, along with several clinical prognostic indexes (e.g. MIPI - Mantle International Prognostic Index), pathological information, treatment and outcome data, biological assessments of disease (MRD - Minimal Residue Disease), as well as many biological, ancillary studies, such as Mutational Analysis, Gene Expression Profiling (GEP) and Pharmacogenomics. In this trial forty-eight Italian medical centers were actively involved, for a total of 300 enrolled subjects. Therefore, I2ECR main objectives are: • to propose an integration project on clinical and molecular data quality concepts. The application of a clear row-data analysis as well as clinical trial monitoring strategies to implement a digital platform where clinical, biological and “omics” data are imported from different sources and well-integrated in a data- ware-house • to be a dynamic repository of data congruency quality rules. I2ECR allows to monitor, in a semi-automatic manner, the quality of data, in relation to the clinical data imported from eCRFs (electronic Case Report Forms) and from biologic and mutational datasets internally edited by local laboratories. Therefore, I2ECR will be able to detect missing data and mistakes derived from non-conventional data- entry activities by centers. • to provide to clinical stake-holders a platform from where they can easily design statistical and data mining analysis. The term Data Mining (DM) identifies a set of tools to searching for hidden patterns of interest in large and multivariate datasets. The applications of DM techniques in the medical field range from outcome prediction and patient classification to genomic medicine and molecular biology. I2ECR allows to clinical stake-holders to propose innovative methods of supervised and unsupervised feature extraction, data classification and statistical analysis on heterogeneous datasets associated to the MCL0208 clinical trial. Although MCL0208 study is the first example of data-population of I2ECR, the environment will be able to import data from clinical studies designed for other onco-hematologic diseases, too

    Geração de esqueletos para sistemas de ETL a partir de redes de Petri colorida

    Get PDF
    As Redes de Petri Coloridas são uma linguagem gráfica com uma semântica bem definida, que permite o desenho, especificação, simulação e validação de sistemas, cujos processos a modelar exijam características específicas de comunicação, concorrência e sincronização entre si. A nível aplicacional, as Redes de Petri Coloridas surgem em áreas muito diferentes, tais como a especificação de protocolos de comunicação, sistemas de controlo, sistemas de hardware ou de sistemas de software. Devido às suas características as Redes de Petri Coloridas foram adotadas, também, na modelação de sistemas de ETL (Extract-Transformation-Load). Meta-tarefas como Change Data Capture ou Surrogate Key Pipelining, frequentemente encontradas em sistemas de ETL convencionais, foram modeladas e validadas através do uso de redes de Petri Coloridas. Tal sustenta, de forma bastante efetiva, o objetivo principal deste trabalho de dissertação: desenvolver e implementar um sistema para a geração de esqueletos para sistemas de ETL a partir da correspondente Rede de Petri Colorida.Coloured Petri Nets are a graphical language with a well-formed semantic, that allows the design, specification, simulation, and validation of systems, which specific characteristics such as, communication, concurrency and synchronization have a main role in the processes to model. At application level, Coloured Petri Nets are used in a wide variety of scientific areas, such as communication protocol, control systems, hardware systems or software systems. Due their characteristics Coloured Petri Nets were also adopted in modeling ETL (Extract-TransformationLoad) systems. Meta-tasks like Change Data Capture or Surrogate Key Pipelining, that are frequently founded in conventional ETL system, were modeling and validated using Coloured Petri Nets. All this support, quite effectively, the main propose of this dissertation work: develop and implement a system to generating skeletons to ETL systems from the corresponding Coloured Petri Nets

    Descubrimiento de conocimientos en la base de datos académica de la Universidad Autónoma de Manizales aplicando redes neuronales

    Get PDF
    Contexto: La educación superior en Colombia es un derecho de todos y es responsabilidad del Ministerio de Educación Nacional garantizarlo. Sin embargo, existen múltiples problemas que representan un reto a la hora de hacer efectivo este derecho. A los problemas propios del sistema educativo como son la baja calidad, la pertinencia y los bajos índices de cobertura, se suman otros problemas tales como la deserción y la poca vocación generados por factores propios del sistema de educación superior y factores externos relacionados con los estudiantes y su entorno social. Objetivo: Este proyecto se propone generar conocimiento útil para encontrar posibles causas del problema de la deserción estudiantil de la Universidad Autónoma de Manizales a partir de las grandes cantidades de información académica generada por los sistemas transaccionales de la universidad. Metodología: La primera fase de este proyecto propone verificar investigaciones previas acerca del problema de la deserción académica y otros problemas asociados a la educación superior a nivel nacional e internacional. Durante la segunda etapa se lleva a cabo el proceso de extracción de la información académica de los sistemas transaccionales de la Universidad Autónoma de Manizales; Y en la fase final se ejecuta el análisis de la información mediante técnicas de minería de datos las cuales son aplicadas de acuerdo al análisis realizado y las técnicas definidas después del proceso de extracción. Resultados: Este proyecto pretende generar como resultados una fuente de datos consolidada y normalizada de información académica de la Universidad Autónoma de Manizales que sea utilizable durante la ejecución de este proyecto y en proyectos futuros de minería de datos e inteligencia de negocios, un framework de minería de datos con una implementación básica para este proyecto, pero extensible a gran variedad de nuevos problemas y técnicas, y por último un conjunto de conclusiones acerca del problema de la deserción a partir de la información académica y las técnicas de minería de datos aplicadas.Context: In Colombia the educations is a right for all and must be guarantee by the National Minister of Education. However this right is complicated due to many problems. The educative system’s problems such as Relevance and low index of coverage must be added to others problems such desertion and lack of vocation generated by external factors related to students and your social environment. Objective: This project is oriented to generate a consolidate data source for find possible causes of student desertion problem in Universidad Autónoma de Manizales from the academic information generated by de transactional systems of the University. Methodology: The first phase of project propose verify previous investigations about the education academic desertion problem and other problems on national and international institutions. By the second phase is executed the extraction of information process from the transactional systems on the University. On the last phase is executed the process of analysis of information through of the data mining techniques selected on previous phases. Results: This project intended generate a consolidated and normalized data source useful by this project and futures projects about of data mining or business intelligence. Other result is a data mining framework with a basic implementation by this project, but extensible for variety of problems and needs. By last a set of conclusions about the academicals desertion problem on the Universidad Autónoma de Manizales
    corecore