87 research outputs found

    Normative model for the diagnosis of neuropsychiatric disorders using deep learning methods

    Get PDF
    Tese de mestrado integrado, Engenharia Biomédica e Biofísica (Engenharia Clínica e Instrumentação Médica) Universidade de Lisboa, Faculdade de Ciências, 2021The diagnosis of neuropsychiatric disorders (NPDs) is still exclusively dependent on the analysis of the signs and symptoms of the patients since there are no biomarkers useful for clinical practice. Considering that several signs and symptoms are shared among different NPDs, the diagnosis is sometimes incorrect. Therefore, therapeutic approaches do not always succeed, which has an impact on the quality of life of neuropsychiatric patients. Furthermore, NPDs have a global economic and demographic impact. For this reason, technological solutions, such as DL, have been researched for the optimization of diagnosis, in the non-technological field of neuropsychiatry. However, the most promising studies on the diagnosis of NPDs with deep learning (DL) are based on binary classification, which may not be the most adequate approach to deal with the continuous spectrum of NPDs. Here, a DL-based normative model was developed to investigate functional connectivity abnormalities, that may contribute to the development of a novel diagnostic procedure. This method is here used to evaluate how patients deviate from a normal pattern learned by a group of healthy people. To create and evaluate the normative model, resting-state functional magnetic resonance imaging (rs-fMRI) data from three different databases were used. In order to maximise the balance between the amount and the quality of the data, conditions were defined to restrict the variability of the scan parameters. Subsequently, rs-fMRI data were trimmed to the lowest number of time points presented in the sample (150). Then, standard preprocessing steps were performed, including removal of the first 4 volumes of functional data, motion correction, spatial smoothing, and high pass filtering. Single-session independent component analysis (ICA) was run, and the FSL-FIX tool was used to clean noise and artefacts. The functional images were then registered to the T1-weighted brain extracted structural images, and finally to the Montreal Neurosciences Institute 152 standard space. Dual regression was applied using fourteen resting-state functional brain networks (FBN) previously identified in the literature. The Pearson’s correlation coefficient between the extracted blood oxygen level-dependent (BOLD) time series of each FBN was calculated, and a 14x14 network connectivity matrix was generated for each subject. The second part of the project consisted of the creation and optimization of a normative model. The normative model consisted of an autoencoder (AE) with three hidden layers. The AE was trained only in healthy subjects and was tested in both healthy subjects and neuropsychiatric patients, including schizophrenia (SCZ), bipolar disorder (BD), and attention deficit hyperactivity disorder (ADHD) patients. The hypothesis was that the model would “fail” on reconstructing data from neuropsychiatric patients. To evaluate the model performance, graph theory metrics were applied. Besides, the mean squared error was calculated for each feature (correlation between pairs of FBN) to evaluate which regions were worse reconstructed for each group of subjects. The pipeline for NPDs was tested for a SCZ case study, with the addition of a clustering algorithm. The results of this dissertation revealed that the proposed pipeline was able to identify patterns of functional connectivity abnormality that characterize different NPDs. Moreover, the results found for the two SCZ groups of patients were similar, which demonstrated that the normative model here presented was also generalizable. However, the group-level differences did not withstand individual-level analysis, implying that NPDs are highly heterogeneous. These findings support the idea that a precision-based medical approach, focusing on the specific functional network changes of individual patients, may be more beneficial than the traditional group-based diagnostic classification. A personalised diagnosis would allow for personalised therapy, improving the quality of life of neuropsychiatric patients

    Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors

    Get PDF
    Meaningful quantification of data and structural uncertainties in conceptual rainfall-runoff modeling is a major scientific and engineering challenge. This paper focuses on the total predictive uncertainty and its decomposition into input and structural components under different inference scenarios. Several Bayesian inference schemes are investigated, differing in the treatment of rainfall and structural uncertainties, and in the precision of the priors describing rainfall uncertainty. Compared with traditional lumped additive error approaches, the quantification of the total predictive uncertainty in the runoff is improved when rainfall and/or structural errors are characterized explicitly. However, the decomposition of the total uncertainty into individual sources is more challenging. In particular, poor identifiability may arise when the inference scheme represents rainfall and structural errors using separate probabilistic models. The inference becomes ill‐posed unless sufficiently precise prior knowledge of data uncertainty is supplied; this ill‐posedness can often be detected from the behavior of the Monte Carlo sampling algorithm. Moreover, the priors on the data quality must also be sufficiently accurate if the inference is to be reliable and support meaningful uncertainty decomposition. Our findings highlight the inherent limitations of inferring inaccurate hydrologic models using rainfall‐runoff data with large unknown errors. Bayesian total error analysis can overcome these problems using independent prior information. The need for deriving independent descriptions of the uncertainties in the input and output data is clearly demonstrated.Benjamin Renard, Dmitri Kavetski, George Kuczera, Mark Thyer, and Stewart W. Frank

    Deep Learning-Based Machinery Fault Diagnostics

    Get PDF
    This book offers a compilation for experts, scholars, and researchers to present the most recent advancements, from theoretical methods to the applications of sophisticated fault diagnosis techniques. The deep learning methods for analyzing and testing complex mechanical systems are of particular interest. Special attention is given to the representation and analysis of system information, operating condition monitoring, the establishment of technical standards, and scientific support of machinery fault diagnosis

    A fuzzy taxonomy for e-Health projects

    Get PDF
    Evaluating the impact of Information Technology (IT) projects represents a problematic task for policy and decision makers aiming to define roadmaps based on previous experiences. Especially in the healthcare sector IT can support a wide range of processes and it is difficult to analyze in a comparative way the benefits and results of e-Health practices in order to define strategies and to assign priorities to potential investments. A first step towards the definition of an evaluation framework to compare e-Health initiatives consists in the definition of clusters of homogeneous projects that can be further analyzed through multiple case studies. However imprecision and subjectivity affect the classification of e-Health projects that are focused on multiple aspects of the complex healthcare system scenario. In this paper we apply a method, based on advanced cluster techniques and fuzzy theories, for validating a project taxonomy in the e-Health sector. An empirical test of the method has been performed over a set of European good practices in order to define a taxonomy for classifying e-Health projects.Evaluating the impact of Information Technology (IT) projects represents a problematic task for policy and decision makers aiming to define roadmaps based on previous experiences. Especially in the healthcare sector IT can support a wide range of processes and it is difficult to analyze in a comparative way the benefits and results of e-Health practices in order to define strategies and to assign priorities to potential investments. A first step towards the definition of an evaluation framework to compare e-Health initiatives consists in the definition of clusters of homogeneous projects that can be further analyzed through multiple case studies. However imprecision and subjectivity affect the classification of e-Health projects that are focused on multiple aspects of the complex healthcare system scenario. In this paper we apply a method, based on advanced cluster techniques and fuzzy theories, for validating a project taxonomy in the e-Health sector. An empirical test of the method has been performed over a set of European good practices in order to define a taxonomy for classifying e-Health projects.Articles published in or submitted to a Journal without IF refereed / of international relevanc

    Chemometric Approaches for Systems Biology

    Full text link
    The present Ph.D. thesis is devoted to study, develop and apply approaches commonly used in chemometrics to the emerging field of systems biology. Existing procedures and new methods are applied to solve research and industrial questions in different multidisciplinary teams. The methodologies developed in this document will enrich the plethora of procedures employed within omic sciences to understand biological organisms and will improve processes in biotechnological industries integrating biological knowledge at different levels and exploiting the software packages derived from the thesis. This dissertation is structured in four parts. The first block describes the framework in which the contributions presented here are based. The objectives of the two research projects related to this thesis are highlighted and the specific topics addressed in this document via conference presentations and research articles are introduced. A comprehensive description of omic sciences and their relationships within the systems biology paradigm is given in this part, jointly with a review of the most applied multivariate methods in chemometrics, on which the novel approaches proposed here are founded. The second part addresses many problems of data understanding within metabolomics, fluxomics, proteomics and genomics. Different alternatives are proposed in this block to understand flux data in steady state conditions. Some are based on applications of multivariate methods previously applied in other chemometrics areas. Others are novel approaches based on a bilinear decomposition using elemental metabolic pathways, from which a GNU licensed toolbox is made freely available for the scientific community. As well, a framework for metabolic data understanding is proposed for non-steady state data, using the same bilinear decomposition proposed for steady state data, but modelling the dynamics of the experiments using novel two and three-way data analysis procedures. Also, the relationships between different omic levels are assessed in this part integrating different sources of information of plant viruses in data fusion models. Finally, an example of interaction between organisms, oranges and fungi, is studied via multivariate image analysis techniques, with future application in food industries. The third block of this thesis is a thoroughly study of different missing data problems related to chemometrics, systems biology and industrial bioprocesses. In the theoretical chapters of this part, new algorithms to obtain multivariate exploratory and regression models in the presence of missing data are proposed, which serve also as preprocessing steps of any other methodology used by practitioners. Regarding applications, this block explores the reconstruction of networks in omic sciences when missing and faulty measurements appear in databases, and how calibration models between near infrared instruments can be transferred, avoiding costs and time-consuming full recalibrations in bioindustries and research laboratories. Finally, another software package, including a graphical user interface, is made freely available for missing data imputation purposes. The last part discusses the relevance of this dissertation for research and biotechnology, including proposals deserving future research.Esta tesis doctoral se centra en el estudio, desarrollo y aplicación de técnicas quimiométricas en el emergente campo de la biología de sistemas. Procedimientos comúnmente utilizados y métodos nuevos se aplican para resolver preguntas de investigación en distintos equipos multidisciplinares, tanto del ámbito académico como del industrial. Las metodologías desarrolladas en este documento enriquecen la plétora de técnicas utilizadas en las ciencias ómicas para entender el funcionamiento de organismos biológicos y mejoran los procesos en la industria biotecnológica, integrando conocimiento biológico a diferentes niveles y explotando los paquetes de software derivados de esta tesis. Esta disertación se estructura en cuatro partes. El primer bloque describe el marco en el cual se articulan las contribuciones aquí presentadas. En él se esbozan los objetivos de los dos proyectos de investigación relacionados con esta tesis. Asimismo, se introducen los temas específicos desarrollados en este documento mediante presentaciones en conferencias y artículos de investigación. En esta parte figura una descripción exhaustiva de las ciencias ómicas y sus interrelaciones en el paradigma de la biología de sistemas, junto con una revisión de los métodos multivariantes más aplicados en quimiometría, que suponen las pilares sobre los que se asientan los nuevos procedimientos aquí propuestos. La segunda parte se centra en resolver problemas dentro de metabolómica, fluxómica, proteómica y genómica a partir del análisis de datos. Para ello se proponen varias alternativas para comprender a grandes rasgos los datos de flujos metabólicos en estado estacionario. Algunas de ellas están basadas en la aplicación de métodos multivariantes propuestos con anterioridad, mientras que otras son técnicas nuevas basadas en descomposiciones bilineales utilizando rutas metabólicas elementales. A partir de éstas se ha desarrollado software de libre acceso para la comunidad científica. A su vez, en esta tesis se propone un marco para analizar datos metabólicos en estado no estacionario. Para ello se adapta el enfoque tradicional para sistemas en estado estacionario, modelando las dinámicas de los experimentos empleando análisis de datos de dos y tres vías. En esta parte de la tesis también se establecen relaciones entre los distintos niveles ómicos, integrando diferentes fuentes de información en modelos de fusión de datos. Finalmente, se estudia la interacción entre organismos, como naranjas y hongos, mediante el análisis multivariante de imágenes, con futuras aplicaciones a la industria alimentaria. El tercer bloque de esta tesis representa un estudio a fondo de diferentes problemas relacionados con datos faltantes en quimiometría, biología de sistemas y en la industria de bioprocesos. En los capítulos más teóricos de esta parte, se proponen nuevos algoritmos para ajustar modelos multivariantes, tanto exploratorios como de regresión, en presencia de datos faltantes. Estos algoritmos sirven además como estrategias de preprocesado de los datos antes del uso de cualquier otro método. Respecto a las aplicaciones, en este bloque se explora la reconstrucción de redes en ciencias ómicas cuando aparecen valores faltantes o atípicos en las bases de datos. Una segunda aplicación de esta parte es la transferencia de modelos de calibración entre instrumentos de infrarrojo cercano, evitando así costosas re-calibraciones en bioindustrias y laboratorios de investigación. Finalmente, se propone un paquete software que incluye una interfaz amigable, disponible de forma gratuita para imputación de datos faltantes. En la última parte, se discuten los aspectos más relevantes de esta tesis para la investigación y la biotecnología, incluyendo líneas futuras de trabajo.Aquesta tesi doctoral es centra en l'estudi, desenvolupament, i aplicació de tècniques quimiomètriques en l'emergent camp de la biologia de sistemes. Procediments comúnment utilizats i mètodes nous s'apliquen per a resoldre preguntes d'investigació en diferents equips multidisciplinars, tant en l'àmbit acadèmic com en l'industrial. Les metodologies desenvolupades en aquest document enriquixen la plétora de tècniques utilitzades en les ciències òmiques per a entendre el funcionament d'organismes biològics i milloren els processos en la indústria biotecnològica, integrant coneixement biològic a distints nivells i explotant els paquets de software derivats d'aquesta tesi. Aquesta dissertació s'estructura en quatre parts. El primer bloc descriu el marc en el qual s'articulen les contribucions ací presentades. En ell s'esbossen els objectius dels dos projectes d'investigació relacionats amb aquesta tesi. Així mateix, s'introduixen els temes específics desenvolupats en aquest document mitjançant presentacions en conferències i articles d'investigació. En aquesta part figura una descripació exhaustiva de les ciències òmiques i les seues interrelacions en el paradigma de la biologia de sistemes, junt amb una revisió dels mètodes multivariants més aplicats en quimiometria, que supossen els pilars sobre els quals s'assenten els nous procediments ací proposats. La segona part es centra en resoldre problemes dins de la metabolòmica, fluxòmica, proteòmica i genòmica a partir de l'anàlisi de dades. Per a això es proposen diverses alternatives per a compendre a grans trets les dades de fluxos metabòlics en estat estacionari. Algunes d'elles estàn basades en l'aplicació de mètodes multivariants propostos amb anterioritat, mentre que altres són tècniques noves basades en descomposicions bilineals utilizant rutes metabòliques elementals. A partir d'aquestes s'ha desenvolupat software de lliure accés per a la comunitat científica. Al seu torn, en aquesta tesi es proposa un marc per a analitzar dades metabòliques en estat no estacionari. Per a això s'adapta l'enfocament tradicional per a sistemes en estat estacionari, modelant les dinàmiques dels experiments utilizant anàlisi de dades de dues i tres vies. En aquesta part de la tesi també s'establixen relacions entre els distints nivells òmics, integrant diferents fonts d'informació en models de fusió de dades. Finalment, s'estudia la interacció entre organismes, com taronges i fongs, mitjançant l'anàlisi multivariant d'imatges, amb futures aplicacions a la indústria alimentària. El tercer bloc d'aquesta tesi representa un estudi a fons de diferents problemes relacionats amb dades faltants en quimiometria, biologia de sistemes i en la indústria de bioprocessos. En els capítols més teòrics d'aquesta part, es proposen nous algoritmes per a ajustar models multivariants, tant exploratoris com de regressió, en presencia de dades faltants. Aquests algoritmes servixen ademés com a estratègies de preprocessat de dades abans de l'ús de qualsevol altre mètode. Respecte a les aplicacions, en aquest bloc s'explora la reconstrucció de xarxes en ciències òmiques quan apareixen valors faltants o atípics en les bases de dades. Una segona aplicació d'aquesta part es la transferència de models de calibració entre instruments d'infrarroig proper, evitant així costoses re-calibracions en bioindústries i laboratoris d'investigació. Finalment, es proposa un paquet software que inclou una interfície amigable, disponible de forma gratuïta per a imputació de dades faltants. En l'última part, es discutixen els aspectes més rellevants d'aquesta tesi per a la investigació i la biotecnologia, incloent línies futures de treball.Folch Fortuny, A. (2016). Chemometric Approaches for Systems Biology [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/77148TESISPremios Extraordinarios de tesis doctorale

    Bayesilaisten menetelmien ja klusterointimenetelmien testaus kunnossapidon vianetsinnässä

    Get PDF
    Data-driven condition monitoring of cut-to-length forest harvesters has developed to a state where substantial amounts of high quality data are available from the harvesting process and especially from the harvester head, which is the main functional part of the harvester. However, the methods that are capable of extracting the essential information from the data are relatively immature. Methods from the field of industrial process monitoring have been applied to the forest harvesting process, but so far with little success. The problem with these methods is that the variation in environmental conditions and the contribution of the human operator have a great influence on both process performance and efficiency. To date, the development of means for measuring these factors has not reached a desired level. This thesis introduces three previously unapplied methods for data-driven condition monitoring on the forest harvester head performance index data. These methods have been used in the process industry earlier. One of the introduced methods is a density based clustering method and the other two are probabilistic methods called the Gaussian mixture and the Bayesian network modeis. The starting point of the analysis involves determining the distribution of the data, finding patterns in the data and identifying dependencies between the index variables. Further based on these observations the process in-control and out-of-control states, including the fault states and the related variabies, are explored. The theoretical part of this thesis introduces forest harvester operation and the collected data, basic concepts of data-driven condition monitoring as well as the data-driven condition monitoring methods and related multivariate statistics. The experimental part applies the introduced condition monitoring modeis to the index data followed by an analysis of the models' suitability. The final conclusions present the findings that contain qualitative observations and recommendations about the models and the data. The main result is that the data is not sufficient to he used with the condition monitoring methods examined in this thesis. Finally, the main findings are Iisted and recommendations for overcoming the shortcomings are proposed. These results can he utilized in the future research of maintenance fault detection of forest harvesters.Tavaralajimenetelmän metsäkoneen datapohjainen kunnonvalvonta on kehittynyt tasolle, jossa huomattava määrä korkealaatuista tietoa on saatavilla harvesterin puunkäsittelyprosessista ja erityisesti harvesteripäältä, joka on harvesterin tärkein toiminnallinen osa. Menetelmät, joilla olennainen informaatio pyritään löytämään datasta, eivät kuitenkaan ole kehittyneet samalla tavalla. Prosessiteollisuudessa käytettyjä menetelmiä on yritetty soveltaa myös metsäkoneisiin, mutta toistaiseksi menestys on ollut heikkoa. Ongelmana on ollut, että ympäristömuuttujien sekä harvesterin kuljettajan vaikutukset puunkorjuuprosessin suorituskykyyn ja tehokkuuteen ovat erittäin suuria. Lisäksi näiden vaikutusten luotettava mittaaminen ei ole vielä ollut riittävällä tasolla. Tässä diplomityössä esitellään kolme harvesteripään datapohjaisen kunnonvalvonnan menetelmää, joita ei ennen ole käytetty metsäkoneissa. Menetelmiä on käytetty prosessiteollisuuden puolella aiemmin. Yksi käytetyistä menetelmistä on tiheyspohjainen klusterointimenetelmä ja kaksi muuta ovat todennäköisyyspohjaisia malleja nimeltään Gaussilainen sekamalli ja Bayesilainen verkko. Analyysin lähtökohtana on datan jakautuneisuuden tutkiminen, säännönmukaisuuksien etsiminen havainnoista sekä riippuvuuksien etsiminen havaittujen muuttujien väliltä. Edelleen näiden havaintojen pohjalta prosessin tilat, mukaan lukien vikatilat ja niihin liittyvät muuttujat pyritään tunnistamaan. Työn teoriaosassa esitellään metsäkoneen toiminnan ja työvaiheiden perusteet, data-pohjaisen kunnonvalvonnan peruskäsitteet sekä datapohjaisen kunnonvalvonnan menetelmiä sekä näihin liittyvät tilastollisten monimuuttujamenetelmien perusteet. Kokeellisessa osassa esiteltyjä menetelmiä sovelletaan dataan ja näiden sopivuutta analysoidaan. Yhteenveto-osioissa esitellään tulokset, jotka sisältävät kvalitatiivisia havaintoja sekä suosituksia koskien malleja ja dataa. Keskeisimpänä tuloksena on, että käytetty data ei ole riittävää tässä työssä käytettyjen kunnonvalvontamenetelmien tarpeisiin. Pääasialliset ongelmakohdat sekä ehdotuksia näiden ongelmien poistamiseksi on esitetty. Näitä tuloksia voidaan käyttää tulevissa tutkimuksissa

    Exploring variability in medical imaging

    Get PDF
    Although recent successes of deep learning and novel machine learning techniques improved the perfor- mance of classification and (anomaly) detection in computer vision problems, the application of these methods in medical imaging pipeline remains a very challenging task. One of the main reasons for this is the amount of variability that is encountered and encapsulated in human anatomy and subsequently reflected in medical images. This fundamental factor impacts most stages in modern medical imaging processing pipelines. Variability of human anatomy makes it virtually impossible to build large datasets for each disease with labels and annotation for fully supervised machine learning. An efficient way to cope with this is to try and learn only from normal samples. Such data is much easier to collect. A case study of such an automatic anomaly detection system based on normative learning is presented in this work. We present a framework for detecting fetal cardiac anomalies during ultrasound screening using generative models, which are trained only utilising normal/healthy subjects. However, despite the significant improvement in automatic abnormality detection systems, clinical routine continues to rely exclusively on the contribution of overburdened medical experts to diagnosis and localise abnormalities. Integrating human expert knowledge into the medical imaging processing pipeline entails uncertainty which is mainly correlated with inter-observer variability. From the per- spective of building an automated medical imaging system, it is still an open issue, to what extent this kind of variability and the resulting uncertainty are introduced during the training of a model and how it affects the final performance of the task. Consequently, it is very important to explore the effect of inter-observer variability both, on the reliable estimation of model’s uncertainty, as well as on the model’s performance in a specific machine learning task. A thorough investigation of this issue is presented in this work by leveraging automated estimates for machine learning model uncertainty, inter-observer variability and segmentation task performance in lung CT scan images. Finally, a presentation of an overview of the existing anomaly detection methods in medical imaging was attempted. This state-of-the-art survey includes both conventional pattern recognition methods and deep learning based methods. It is one of the first literature surveys attempted in the specific research area.Open Acces

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users

    Full text link
    [EN] The electricity sector is currently undergoing a process of liberalization and separation of roles, which is being implemented under the regulatory auspices of each Member State of the European Union and, therefore, with different speeds, perspectives and objectives that must converge on a common horizon, where Europe will benefit from an interconnected energy market in which producers and consumers can participate in free competition. This process of liberalization and separation of roles involves two consequences or, viewed another way, entails a major consequence from which other immediate consequence, as a necessity, is derived. The main consequence is the increased complexity in the management and supervision of a system, the electrical, increasingly interconnected and participatory, with connection of distributed energy sources, much of them from renewable sources, at different voltage levels and with different generation capacity at any point in the network. From this situation the other consequence is derived, which is the need to communicate information between agents, reliably, safely and quickly, and that this information is analyzed in the most effective way possible, to form part of the processes of decision taking that improve the observability and controllability of a system which is increasing in complexity and number of agents involved. With the evolution of Information and Communication Technologies (ICT), and the investments both in improving existing measurement and communications infrastructure, and taking the measurement and actuation capacity to a greater number of points in medium and low voltage networks, the availability of data that informs of the state of the network is increasingly higher and more complete. All these systems are part of the so-called Smart Grids, or intelligent networks of the future, a future which is not so far. One such source of information comes from the energy consumption of customers, measured on a regular basis (every hour, half hour or quarter-hour) and sent to the Distribution System Operators from the Smart Meters making use of Advanced Metering Infrastructure (AMI). This way, there is an increasingly amount of information on the energy consumption of customers, being stored in Big Data systems. This growing source of information demands specialized techniques which can take benefit from it, extracting a useful and summarized knowledge from it. This thesis deals with the use of this information of energy consumption from Smart Meters, in particular on the application of data mining techniques to obtain temporal patterns that characterize the users of electrical energy, grouping them according to these patterns in a small number of groups or clusters, that allow evaluating how users consume energy, both during the day and during a sequence of days, allowing to assess trends and predict future scenarios. For this, the current techniques are studied and, proving that the current works do not cover this objective, clustering or dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users are developed. These techniques are tested and validated on a database of hourly energy consumption values for a sample of residential customers in Spain during years 2008 and 2009. The results allow to observe both the characterization in consumption patterns of the different types of residential energy consumers, and their evolution over time, and to assess, for example, how the regulatory changes that occurred in Spain in the electricity sector during those years influenced in the temporal patterns of energy consumption.[ES] El sector eléctrico se halla actualmente sometido a un proceso de liberalización y separación de roles, que está siendo aplicado bajo los auspicios regulatorios de cada Estado Miembro de la Unión Europea y, por tanto, con distintas velocidades, perspectivas y objetivos que deben confluir en un horizonte común, en donde Europa se beneficiará de un mercado energético interconectado, en el cual productores y consumidores podrán participar en libre competencia. Este proceso de liberalización y separación de roles conlleva dos consecuencias o, visto de otra manera, conlleva una consecuencia principal de la cual se deriva, como necesidad, otra consecuencia inmediata. La consecuencia principal es el aumento de la complejidad en la gestión y supervisión de un sistema, el eléctrico, cada vez más interconectado y participativo, con conexión de fuentes distribuidas de energía, muchas de ellas de origen renovable, a distintos niveles de tensión y con distinta capacidad de generación, en cualquier punto de la red. De esta situación se deriva la otra consecuencia, que es la necesidad de comunicar información entre los distintos agentes, de forma fiable, segura y rápida, y que esta información sea analizada de la forma más eficaz posible, para que forme parte de los procesos de toma de decisiones que mejoran la observabilidad y controlabilidad de un sistema cada vez más complejo y con más agentes involucrados. Con el avance de las Tecnologías de Información y Comunicaciones (TIC), y las inversiones tanto en mejora de la infraestructura existente de medida y comunicaciones, como en llevar la obtención de medidas y la capacidad de actuación a un mayor número de puntos en redes de media y baja tensión, la disponibilidad de datos sobre el estado de la red es cada vez mayor y más completa. Todos estos sistemas forman parte de las llamadas Smart Grids, o redes inteligentes del futuro, un futuro ya no tan lejano. Una de estas fuentes de información proviene de los consumos energéticos de los clientes, medidos de forma periódica (cada hora, media hora o cuarto de hora) y enviados hacia las Distribuidoras desde los contadores inteligentes o Smart Meters, mediante infraestructura avanzada de medida o Advanced Metering Infrastructure (AMI). De esta forma, cada vez se tiene una mayor cantidad de información sobre los consumos energéticos de los clientes, almacenada en sistemas de Big Data. Esta cada vez mayor fuente de información demanda técnicas especializadas que sepan aprovecharla, extrayendo un conocimiento útil y resumido de la misma. La presente Tesis doctoral versa sobre el uso de esta información de consumos energéticos de los contadores inteligentes, en concreto sobre la aplicación de técnicas de minería de datos (data mining) para obtener patrones temporales que caractericen a los usuarios de energía eléctrica, agrupándolos según estos mismos patrones en un número reducido de grupos o clusters, que permiten evaluar la forma en que los usuarios consumen la energía, tanto a lo largo del día como durante una secuencia de días, permitiendo evaluar tendencias y predecir escenarios futuros. Para ello se estudian las técnicas actuales y, comprobando que los trabajos actuales no cubren este objetivo, se desarrollan técnicas de clustering o segmentación dinámica aplicadas a curvas de carga de consumo eléctrico diario de clientes domésticos. Estas técnicas se prueban y validan sobre una base de datos de consumos energéticos horarios de una muestra de clientes residenciales en España durante los años 2008 y 2009. Los resultados permiten observar tanto la caracterización en consumos de los distintos tipos de consumidores energéticos residenciales, como su evolución en el tiempo, y permiten evaluar, por ejemplo, cómo influenciaron en los patrones temporales de consumos los cambios regulatorios que se produjeron en España en el sector eléctrico durante esos años.[CA] El sector elèctric es troba actualment sotmès a un procés de liberalització i separació de rols, que s'està aplicant davall els auspicis reguladors de cada estat membre de la Unió Europea i, per tant, amb distintes velocitats, perspectives i objectius que han de confluir en un horitzó comú, on Europa es beneficiarà d'un mercat energètic interconnectat, en el qual productors i consumidors podran participar en lliure competència. Aquest procés de liberalització i separació de rols comporta dues conseqüències o, vist d'una altra manera, comporta una conseqüència principal de la qual es deriva, com a necessitat, una altra conseqüència immediata. La conseqüència principal és l'augment de la complexitat en la gestió i supervisió d'un sistema, l'elèctric, cada vegada més interconnectat i participatiu, amb connexió de fonts distribuïdes d'energia, moltes d'aquestes d'origen renovable, a distints nivells de tensió i amb distinta capacitat de generació, en qualsevol punt de la xarxa. D'aquesta situació es deriva l'altra conseqüència, que és la necessitat de comunicar informació entre els distints agents, de forma fiable, segura i ràpida, i que aquesta informació siga analitzada de la manera més eficaç possible, perquè forme part dels processos de presa de decisions que milloren l'observabilitat i controlabilitat d'un sistema cada vegada més complex i amb més agents involucrats. Amb l'avanç de les tecnologies de la informació i les comunicacions (TIC), i les inversions, tant en la millora de la infraestructura existent de mesura i comunicacions, com en el trasllat de l'obtenció de mesures i capacitat d'actuació a un nombre més gran de punts en xarxes de mitjana i baixa tensió, la disponibilitat de dades sobre l'estat de la xarxa és cada vegada major i més completa. Tots aquests sistemes formen part de les denominades Smart Grids o xarxes intel·ligents del futur, un futur ja no tan llunyà. Una d'aquestes fonts d'informació prové dels consums energètics dels clients, mesurats de forma periòdica (cada hora, mitja hora o quart d'hora) i enviats cap a les distribuïdores des dels comptadors intel·ligents o Smart Meters, per mitjà d'infraestructura avançada de mesura o Advanced Metering Infrastructure (AMI). D'aquesta manera, cada vegada es té una major quantitat d'informació sobre els consums energètics dels clients, emmagatzemada en sistemes de Big Data. Aquesta cada vegada major font d'informació demanda tècniques especialitzades que sàpiguen aprofitar-la, extraient-ne un coneixement útil i resumit. La present tesi doctoral versa sobre l'ús d'aquesta informació de consums energètics dels comptadors intel·ligents, en concret sobre l'aplicació de tècniques de mineria de dades (data mining) per a obtenir patrons temporals que caracteritzen els usuaris d'energia elèctrica, agrupant-los segons aquests mateixos patrons en una quantitat reduïda de grups o clusters, que permeten avaluar la forma en què els usuaris consumeixen l'energia, tant al llarg del dia com durant una seqüència de dies, i que permetent avaluar tendències i predir escenaris futurs. Amb aquesta finalitat, s'estudien les tècniques actuals i, en comprovar que els treballs actuals no cobreixen aquest objectiu, es desenvolupen tècniques de clustering o segmentació dinàmica aplicades a corbes de càrrega de consum elèctric diari de clients domèstics. Aquestes tècniques es proven i validen sobre una base de dades de consums energètics horaris d'una mostra de clients residencials a Espanya durant els anys 2008 i 2009. Els resultats permeten observar tant la caracterització en consums dels distints tipus de consumidors energètics residencials, com la seua evolució en el temps, i permeten avaluar, per exemple, com van influenciar en els patrons temporals de consums els canvis reguladors que es van produir a Espanya en el sector elèctric durant aquests anys.Benítez Sánchez, IJ. (2015). Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59236TESI
    corecore