6 research outputs found

    Desarrollo de modelos basados en patrones para la predicción de series temporales en entornos Big Data

    Get PDF
    Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111Esta Tesis Doctoral se presenta mediante la modalidad de compendio de publicaciones y en ella se aportan distintas contribuciones científicas en Congresos Internacionales y revistas con alto índice de impacto en el Journal of Citation Reports (JCR). Durante los cinco años de investigación a tiempo parcial, se ha realizado una investigación encaminada al estudio, análisis y predicción de grandes conjuntos de series temporales, principalmente de tipo energético. Para ello, se han seguido las últimas tendencias tecnológicas en el ámbito de la computación distribuida, desarrollando la experimentación íntegramente en Scala, el lenguaje nativo del framework Apache Spark, realizando las pruebas experimentales en entornos reales como Amazon Web Services u Open Telekom Cloud. La primera fase de la Tesis Doctoral se centra en el desarrollo y aplicación de una metodología que permite analizar de manera eficiente conjuntos de datos que contienen series temporales de consumo eléctrico, generados por la red de contadores eléctricos inteligentes instalados en la Universidad Pablo de Olavide. La metodología propuesta se enfoca principalmente en la correcta aplicación en entornos distribuidos del algoritmo de clustering K-means a grandes conjuntos de datos, permitiendo segmentar conjuntos de nn observaciones en kk grupos distintos con características similares. Esta tarea se realiza utilizando una versión paralelizada del algoritmo llamado K-means++, incluido en la Machine Learning Library de Apache Spark. Para la elección del número óptimo de clusters, se adopta una estrategia en la que se evalúan distintos índices de validación de clusters tales como el Within Set Sum of Squared Error, Davies-Bouldin, Dunn y Silhouette, todos ellos desarrollados para su aplicación en entornos distribuidos. Los resultados de esta experimentación se expusieron en 13th International Conference on Distributed Computing and Artificial Intelligence. Posteriormente, se amplió la experimentación y la metodología, resultando en un artículo publicado en la revista Energies, indexada en JCR con categoría Q3. La segunda parte del trabajo realizado consiste en el diseño de una metodología y desarrollo de un algoritmo capaz de pronosticar eficazmente series temporales en entornos Big Data. Para ello, se analizó el conocido algoritmo Pattern Sequence-based Forecasting (PSF), con dos objetivos principales: por un lado, su adaptación para aplicarlo en entornos escalables y distribuidos y, por otro lado, la mejora de las predicciones que realiza, enfocándolo a la explotación de grandes conjuntos de datos de una manera eficiente. En este sentido, se ha desarrollado en lenguaje Scala un algoritmo llamado bigPSF, que se integra en el marco de una completa metodología diseñada para a pronosticar el consumo energético de una Smart City. Finalmente, se desarrolló una variante del algoritmo bigPSF llamada MV-bigPSF, capaz de predecir series temporales multivariables. Esta experimentación se ha plasmado en dos artículos científicos publicados en las revistas Information Sciences (para el artículo relativo al algoritmo bigPSF) y Applied Energy (relativo al estudio de la versión multivariable del mismo), ambas con un índice de impacto JCR con categoría Q1.Universidad Pablo de Olavide de Sevilla. Escuela de Doctorad

    Estudio del comportamiento temporal de entornos distribuidos virtualizados

    Get PDF
    Los principales objetivos son: Estudio de software de virtualización, plataformas en la nube, middleware de comunicaciones y planificador de tareas Linux. Desarrollo del software e instalación de herramientas realización de los test. Realización de los test y conclusiones. Desarrollo de código de la aplicación e integración en OpenStack. Conclusiones y elaboración de posibles líneas de mejora.The main objectives of this Final Degree Work are as the listed below: Studies on virtualization software, cloud platforms, communications middleware and Linux scheduler tasks. Development of a test code and installation tools to perform the test. Running the test and conclusions. Application development and integration in OpenStack. Conclusions and possible improvements.Ingeniería Telemátic

    Técnicas big data para el procesamiento de flujos de datos masivos en tiempo real

    Get PDF
    Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111Machine learning techniques have become one of the most demanded resources by companies due to the large volume of data that surrounds us in these days. The main objective of these technologies is to solve complex problems in an automated way using data. One of the current perspectives of machine learning is the analysis of continuous flows of data or data streaming. This approach is increasingly requested by enterprises as a result of the large number of information sources producing time-indexed data at high frequency, such as sensors, Internet of Things devices, social networks, etc. However, nowadays, research is more focused on the study of historical data than on data received in streaming. One of the main reasons for this is the enormous challenge that this type of data presents for the modeling of machine learning algorithms. This Doctoral Thesis is presented in the form of a compendium of publications with a total of 10 scientific contributions in International Conferences and journals with high impact index in the Journal Citation Reports (JCR). The research developed during the PhD Program focuses on the study and analysis of real-time or streaming data through the development of new machine learning algorithms. Machine learning algorithms for real-time data consist of a different type of modeling than the traditional one, where the model is updated online to provide accurate responses in the shortest possible time. The main objective of this Doctoral Thesis is the contribution of research value to the scientific community through three new machine learning algorithms. These algorithms are big data techniques and two of them work with online or streaming data. In this way, contributions are made to the development of one of the current trends in Artificial Intelligence. With this purpose, algorithms are developed for descriptive and predictive tasks, i.e., unsupervised and supervised learning, respectively. Their common idea is the discovery of patterns in the data. The first technique developed during the dissertation is a triclustering algorithm to produce three-dimensional data clusters in offline or batch mode. This big data algorithm is called bigTriGen. In a general way, an evolutionary metaheuristic is used to search for groups of data with similar patterns. The model uses genetic operators such as selection, crossover, mutation or evaluation operators at each iteration. The goal of the bigTriGen is to optimize the evaluation function to achieve triclusters of the highest possible quality. It is used as the basis for the second technique implemented during the Doctoral Thesis. The second algorithm focuses on the creation of groups over three-dimensional data received in real-time or in streaming. It is called STriGen. Streaming modeling is carried out starting from an offline or batch model using historical data. As soon as this model is created, it starts receiving data in real-time. The model is updated in an online or streaming manner to adapt to new streaming patterns. In this way, the STriGen is able to detect concept drifts and incorporate them into the model as quickly as possible, thus producing triclusters in real-time and of good quality. The last algorithm developed in this dissertation follows a supervised learning approach for time series forecasting in real-time. It is called StreamWNN. A model is created with historical data based on the k-nearest neighbor or KNN algorithm. Once the model is created, data starts to be received in real-time. The algorithm provides real-time predictions of future data, keeping the model always updated in an incremental way and incorporating streaming patterns identified as novelties. The StreamWNN also identifies anomalous data in real-time allowing this feature to be used as a security measure during its application. The developed algorithms have been evaluated with real data from devices and sensors. These new techniques have demonstrated to be very useful, providing meaningful triclusters and accurate predictions in real time.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e informátic

    Clinical decision support: Knowledge representation and uncertainty management

    Get PDF
    Programa Doutoral em Engenharia BiomédicaDecision-making in clinical practice is faced with many challenges due to the inherent risks of being a health care professional. From medical error to undesired variations in clinical practice, the mitigation of these issues seems to be tightly connected to the adherence to Clinical Practice Guidelines as evidence-based recommendations The deployment of Clinical Practice Guidelines in computational systems for clinical decision support has the potential to positively impact health care. However, current approaches to Computer-Interpretable Guidelines evidence a set of issues that leave them wanting. These issues are related with the lack of expressiveness of their underlying models, the complexity of knowledge acquisition with their tools, the absence of support to the clinical decision making process, and the style of communication of Clinical Decision Support Systems implementing Computer-Interpretable Guidelines. Such issues pose as obstacles that prevent these systems from showing properties like modularity, flexibility, adaptability, and interactivity. All these properties reflect the concept of living guidelines. The purpose of this doctoral thesis is, thus, to provide a framework that enables the expression of these properties. The modularity property is conferred by the ontological definition of Computer-Interpretable Guidelines and the assistance in guideline acquisition provided by an editing tool, allowing for the management of multiple knowledge patterns that can be reused. Flexibility is provided by the representation primitives defined in the ontology, meaning that the model is adjustable to guidelines from different categories and specialities. On to adaptability, this property is conferred by mechanisms of Speculative Computation, which allow the Decision Support System to not only reason with incomplete information but to adapt to changes of state, such as suddenly knowing the missing information. The solution proposed for interactivity consists in embedding Computer-Interpretable Guideline advice directly into the daily life of health care professionals and provide a set of reminders and notifications that help them to keep track of their tasks and responsibilities. All these solutions make the CompGuide framework for the expression of Clinical Decision Support Systems based on Computer-Interpretable Guidelines.A tomada de decisão na prática clínica enfrenta inúmeros desafios devido aos riscos inerentes a ser um profissional de saúde. Desde o erro medico até às variações indesejadas da prática clínica, a atenuação destes problemas parece estar intimamente ligada à adesão a Protocolos Clínicos, uma vez que estes são recomendações baseadas na evidencia. A operacionalização de Protocolos Clínicos em sistemas computacionais para apoio à decisão clínica apresenta o potencial de ter um impacto positivo nos cuidados de saúde. Contudo, as abordagens atuais a Protocolos Clínicos Interpretáveis por Maquinas evidenciam um conjunto de problemas que as deixa a desejar. Estes problemas estão relacionados com a falta de expressividade dos modelos que lhes estão subjacentes, a complexidade da aquisição de conhecimento utilizando as suas ferramentas, a ausência de suporte ao processo de decisão clínica e o estilo de comunicação dos Sistemas de Apoio à Decisão Clínica que implementam Protocolos Clínicos Interpretáveis por Maquinas. Tais problemas constituem obstáculos que impedem estes sistemas de apresentarem propriedades como modularidade, flexibilidade, adaptabilidade e interatividade. Todas estas propriedades refletem o conceito de living guidelines. O propósito desta tese de doutoramento é, portanto, o de fornecer uma estrutura que possibilite a expressão destas propriedades. A modularidade é conferida pela definição ontológica dos Protocolos Clínicos Interpretáveis por Maquinas e pela assistência na aquisição de protocolos fornecida por uma ferramenta de edição, permitindo assim a gestão de múltiplos padrões de conhecimento que podem ser reutilizados. A flexibilidade é atribuída pelas primitivas de representação definidas na ontologia, o que significa que o modelo é ajustável a protocolos de diferentes categorias e especialidades. Quanto à adaptabilidade, esta é conferida por mecanismos de Computação Especulativa que permitem ao Sistema de Apoio à Decisão não só raciocinar com informação incompleta, mas também adaptar-se a mudanças de estado, como subitamente tomar conhecimento da informação em falta. A solução proposta para a interatividade consiste em incorporar as recomendações dos Protocolos Clínicos Interpretáveis por Maquinas diretamente no dia a dia dos profissionais de saúde e fornecer um conjunto de lembretes e notificações que os auxiliam a rastrear as suas tarefas e responsabilidades. Todas estas soluções constituem a estrutura CompGuide para a expressão de Sistemas de Apoio à Decisão Clínica baseados em Protocolos Clínicos Interpretáveis por Máquinas.The work of the PhD candidate Tiago José Martins Oliveira is supported by a grant from FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) with the reference SFRH/BD/85291/ 2012

    On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters

    Get PDF
    This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p
    corecore