204 research outputs found

    Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users

    Full text link
    [EN] The electricity sector is currently undergoing a process of liberalization and separation of roles, which is being implemented under the regulatory auspices of each Member State of the European Union and, therefore, with different speeds, perspectives and objectives that must converge on a common horizon, where Europe will benefit from an interconnected energy market in which producers and consumers can participate in free competition. This process of liberalization and separation of roles involves two consequences or, viewed another way, entails a major consequence from which other immediate consequence, as a necessity, is derived. The main consequence is the increased complexity in the management and supervision of a system, the electrical, increasingly interconnected and participatory, with connection of distributed energy sources, much of them from renewable sources, at different voltage levels and with different generation capacity at any point in the network. From this situation the other consequence is derived, which is the need to communicate information between agents, reliably, safely and quickly, and that this information is analyzed in the most effective way possible, to form part of the processes of decision taking that improve the observability and controllability of a system which is increasing in complexity and number of agents involved. With the evolution of Information and Communication Technologies (ICT), and the investments both in improving existing measurement and communications infrastructure, and taking the measurement and actuation capacity to a greater number of points in medium and low voltage networks, the availability of data that informs of the state of the network is increasingly higher and more complete. All these systems are part of the so-called Smart Grids, or intelligent networks of the future, a future which is not so far. One such source of information comes from the energy consumption of customers, measured on a regular basis (every hour, half hour or quarter-hour) and sent to the Distribution System Operators from the Smart Meters making use of Advanced Metering Infrastructure (AMI). This way, there is an increasingly amount of information on the energy consumption of customers, being stored in Big Data systems. This growing source of information demands specialized techniques which can take benefit from it, extracting a useful and summarized knowledge from it. This thesis deals with the use of this information of energy consumption from Smart Meters, in particular on the application of data mining techniques to obtain temporal patterns that characterize the users of electrical energy, grouping them according to these patterns in a small number of groups or clusters, that allow evaluating how users consume energy, both during the day and during a sequence of days, allowing to assess trends and predict future scenarios. For this, the current techniques are studied and, proving that the current works do not cover this objective, clustering or dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users are developed. These techniques are tested and validated on a database of hourly energy consumption values for a sample of residential customers in Spain during years 2008 and 2009. The results allow to observe both the characterization in consumption patterns of the different types of residential energy consumers, and their evolution over time, and to assess, for example, how the regulatory changes that occurred in Spain in the electricity sector during those years influenced in the temporal patterns of energy consumption.[ES] El sector eléctrico se halla actualmente sometido a un proceso de liberalización y separación de roles, que está siendo aplicado bajo los auspicios regulatorios de cada Estado Miembro de la Unión Europea y, por tanto, con distintas velocidades, perspectivas y objetivos que deben confluir en un horizonte común, en donde Europa se beneficiará de un mercado energético interconectado, en el cual productores y consumidores podrán participar en libre competencia. Este proceso de liberalización y separación de roles conlleva dos consecuencias o, visto de otra manera, conlleva una consecuencia principal de la cual se deriva, como necesidad, otra consecuencia inmediata. La consecuencia principal es el aumento de la complejidad en la gestión y supervisión de un sistema, el eléctrico, cada vez más interconectado y participativo, con conexión de fuentes distribuidas de energía, muchas de ellas de origen renovable, a distintos niveles de tensión y con distinta capacidad de generación, en cualquier punto de la red. De esta situación se deriva la otra consecuencia, que es la necesidad de comunicar información entre los distintos agentes, de forma fiable, segura y rápida, y que esta información sea analizada de la forma más eficaz posible, para que forme parte de los procesos de toma de decisiones que mejoran la observabilidad y controlabilidad de un sistema cada vez más complejo y con más agentes involucrados. Con el avance de las Tecnologías de Información y Comunicaciones (TIC), y las inversiones tanto en mejora de la infraestructura existente de medida y comunicaciones, como en llevar la obtención de medidas y la capacidad de actuación a un mayor número de puntos en redes de media y baja tensión, la disponibilidad de datos sobre el estado de la red es cada vez mayor y más completa. Todos estos sistemas forman parte de las llamadas Smart Grids, o redes inteligentes del futuro, un futuro ya no tan lejano. Una de estas fuentes de información proviene de los consumos energéticos de los clientes, medidos de forma periódica (cada hora, media hora o cuarto de hora) y enviados hacia las Distribuidoras desde los contadores inteligentes o Smart Meters, mediante infraestructura avanzada de medida o Advanced Metering Infrastructure (AMI). De esta forma, cada vez se tiene una mayor cantidad de información sobre los consumos energéticos de los clientes, almacenada en sistemas de Big Data. Esta cada vez mayor fuente de información demanda técnicas especializadas que sepan aprovecharla, extrayendo un conocimiento útil y resumido de la misma. La presente Tesis doctoral versa sobre el uso de esta información de consumos energéticos de los contadores inteligentes, en concreto sobre la aplicación de técnicas de minería de datos (data mining) para obtener patrones temporales que caractericen a los usuarios de energía eléctrica, agrupándolos según estos mismos patrones en un número reducido de grupos o clusters, que permiten evaluar la forma en que los usuarios consumen la energía, tanto a lo largo del día como durante una secuencia de días, permitiendo evaluar tendencias y predecir escenarios futuros. Para ello se estudian las técnicas actuales y, comprobando que los trabajos actuales no cubren este objetivo, se desarrollan técnicas de clustering o segmentación dinámica aplicadas a curvas de carga de consumo eléctrico diario de clientes domésticos. Estas técnicas se prueban y validan sobre una base de datos de consumos energéticos horarios de una muestra de clientes residenciales en España durante los años 2008 y 2009. Los resultados permiten observar tanto la caracterización en consumos de los distintos tipos de consumidores energéticos residenciales, como su evolución en el tiempo, y permiten evaluar, por ejemplo, cómo influenciaron en los patrones temporales de consumos los cambios regulatorios que se produjeron en España en el sector eléctrico durante esos años.[CA] El sector elèctric es troba actualment sotmès a un procés de liberalització i separació de rols, que s'està aplicant davall els auspicis reguladors de cada estat membre de la Unió Europea i, per tant, amb distintes velocitats, perspectives i objectius que han de confluir en un horitzó comú, on Europa es beneficiarà d'un mercat energètic interconnectat, en el qual productors i consumidors podran participar en lliure competència. Aquest procés de liberalització i separació de rols comporta dues conseqüències o, vist d'una altra manera, comporta una conseqüència principal de la qual es deriva, com a necessitat, una altra conseqüència immediata. La conseqüència principal és l'augment de la complexitat en la gestió i supervisió d'un sistema, l'elèctric, cada vegada més interconnectat i participatiu, amb connexió de fonts distribuïdes d'energia, moltes d'aquestes d'origen renovable, a distints nivells de tensió i amb distinta capacitat de generació, en qualsevol punt de la xarxa. D'aquesta situació es deriva l'altra conseqüència, que és la necessitat de comunicar informació entre els distints agents, de forma fiable, segura i ràpida, i que aquesta informació siga analitzada de la manera més eficaç possible, perquè forme part dels processos de presa de decisions que milloren l'observabilitat i controlabilitat d'un sistema cada vegada més complex i amb més agents involucrats. Amb l'avanç de les tecnologies de la informació i les comunicacions (TIC), i les inversions, tant en la millora de la infraestructura existent de mesura i comunicacions, com en el trasllat de l'obtenció de mesures i capacitat d'actuació a un nombre més gran de punts en xarxes de mitjana i baixa tensió, la disponibilitat de dades sobre l'estat de la xarxa és cada vegada major i més completa. Tots aquests sistemes formen part de les denominades Smart Grids o xarxes intel·ligents del futur, un futur ja no tan llunyà. Una d'aquestes fonts d'informació prové dels consums energètics dels clients, mesurats de forma periòdica (cada hora, mitja hora o quart d'hora) i enviats cap a les distribuïdores des dels comptadors intel·ligents o Smart Meters, per mitjà d'infraestructura avançada de mesura o Advanced Metering Infrastructure (AMI). D'aquesta manera, cada vegada es té una major quantitat d'informació sobre els consums energètics dels clients, emmagatzemada en sistemes de Big Data. Aquesta cada vegada major font d'informació demanda tècniques especialitzades que sàpiguen aprofitar-la, extraient-ne un coneixement útil i resumit. La present tesi doctoral versa sobre l'ús d'aquesta informació de consums energètics dels comptadors intel·ligents, en concret sobre l'aplicació de tècniques de mineria de dades (data mining) per a obtenir patrons temporals que caracteritzen els usuaris d'energia elèctrica, agrupant-los segons aquests mateixos patrons en una quantitat reduïda de grups o clusters, que permeten avaluar la forma en què els usuaris consumeixen l'energia, tant al llarg del dia com durant una seqüència de dies, i que permetent avaluar tendències i predir escenaris futurs. Amb aquesta finalitat, s'estudien les tècniques actuals i, en comprovar que els treballs actuals no cobreixen aquest objectiu, es desenvolupen tècniques de clustering o segmentació dinàmica aplicades a corbes de càrrega de consum elèctric diari de clients domèstics. Aquestes tècniques es proven i validen sobre una base de dades de consums energètics horaris d'una mostra de clients residencials a Espanya durant els anys 2008 i 2009. Els resultats permeten observar tant la caracterització en consums dels distints tipus de consumidors energètics residencials, com la seua evolució en el temps, i permeten avaluar, per exemple, com van influenciar en els patrons temporals de consums els canvis reguladors que es van produir a Espanya en el sector elèctric durant aquests anys.Benítez Sánchez, IJ. (2015). Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59236TESI

    Cooperative Clustering Model and Its Applications

    Get PDF
    Data clustering plays an important role in many disciplines, including data mining, machine learning, bioinformatics, pattern recognition, and other fields, where there is a need to learn the inherent grouping structure of data in an unsupervised manner. There are many clustering approaches proposed in the literature with different quality/complexity tradeoffs. Each clustering algorithm works on its domain space with no optimum solution to all datasets of different properties, sizes, structures, and distributions. Challenges in data clustering include, identifying proper number of clusters, scalability of the clustering approach, robustness to noise, tackling distributed datasets, and handling clusters of different configurations. This thesis addresses some of these challenges through cooperation between multiple clustering approaches. We introduce a Cooperative Clustering (CC) model that involves multiple clustering techniques; the goal of the cooperative model is to increase the homogeneity of objects within clusters through cooperation by developing two data structures, cooperative contingency graph and histogram representation of pair-wise similarities. The two data structures are designed to find the matching sub-clusters between different clusterings and to obtain the final set of cooperative clusters through a merging process. Obtaining the co-occurred objects from the different clusterings enables the cooperative model to group objects based on a multiple agreement between the invoked clustering techniques. In addition, merging this set of sub-clusters using histograms poses a new trend of grouping objects into more homogenous clusters. The cooperative model is consistent, reusable, and scalable in terms of the number of the adopted clustering approaches. In order to deal with noisy data, a novel Cooperative Clustering Outliers Detection (CCOD) algorithm is implemented through the implication of the cooperation methodology for better detection of outliers in data. The new detection approach is designed in four phases, (1) Global non-cooperative Clustering, (2) Cooperative Clustering, (3) Possible outlier’s Detection, and finally (4) Candidate Outliers Detection. The detection of outliers is established in a bottom-up scenario. The thesis also addresses cooperative clustering in distributed Peer-to-Peer (P2P) networks. Mining large and inherently distributed datasets poses many challenges, one of which is the extraction of a global model as a global summary of the clustering solutions generated from all nodes for the purpose of interpreting the clustering quality of the distributed dataset as if it was located at one node. We developed distributed cooperative model and architecture that work on a two-tier super-peer P2P network. The model is called Distributed Cooperative Clustering in Super-peer P2P Networks (DCCP2P). This model aims at producing one clustering solution across the whole network. It specifically addresses scalability of network size, and consequently the distributed clustering complexity, by modeling the distributed clustering problem as two layers of peer neighborhoods and super-peers. Summarization of the global distributed clusters is achieved through a distributed version of the cooperative clustering model. Three clustering algorithms, k-means (KM), Bisecting k-means (BKM) and Partitioning Around Medoids (PAM) are invoked in the cooperative model. Results on various gene expression and text documents datasets with different properties, configurations and different degree of outliers reveal that: (i) the cooperative clustering model achieves significant improvement in the quality of the clustering solutions compared to that of the non-cooperative individual approaches; (ii) the cooperative detection algorithm discovers the nonconforming objects in data with better accuracy than the contemporary approaches, and (iii) the distributed cooperative model attains the same quality or even better as the centralized approach and achieves decent speedup by increasing number of nodes. The distributed model offers high degree of flexibility, scalability, and interpretability of large distributed repositories. Achieving the same results using current methodologies requires polling the data first to one center location, which is sometimes not feasible

    Recent Developments in Document Clustering

    Get PDF
    This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed

    Flow time series clustering for demand pattern recognition in drinking water distribution systems: New insights about the most adequate methods

    Get PDF
    This study presents a proposal of clustering methodologies for demand pattern recognition using network flow data collected from a large set of drinking water distribution networks in Portugal. Most of the existing studies about clustering in flow time series rely on hierarchical or k-Means clustering algorithms with inelastic measures distances. This study explores alternative clustering algorithms, distance measures, comparison time windows, internal index metrics and clustering prototypes. The performance of the alternative clustering methodology was assessed in terms of multiple internal index metrics and the characterization of the cluster centroids. The methods with the best performance were Partition Algorithm with DTW distance, PAM prototype with 15 minutes time window and the Partition Algorithm with GAK distance, PAM prototype and 15 minutes time window because they allow a clear partition of flow time series in three clusters. The first method identifies a night consumption pattern, a typical weekend pattern and a typical working day pattern, whereas the second one identifies a pattern with small variability between night and daily consumption. To improve knowledge extraction, in terms of typical and anomalous existing patterns, additional clustering operations were performed with the flow data set that belongs to the cluster with small variability between night and daily consumption. New clusters were identified and characterized regarding weekday, geographical location, and dry months and wet months, showing that patterns associated with garden irrigation are independent of the period of the day and season of the year, which indicates an inefficient water use.Este estudo apresenta uma proposta de metodologias de clustering para reconhecimento de padrões de consumo usando um conjunto de dados de caudal coletados em redes de distribuição de água em Portugal. A maioria dos estudos existentes sobre clustering em séries temporais de caudal baseia-se em algoritmos de clustering hierárquicos ou de k-Means com medidas de distâncias inelásticas. Este estudo explora alternativas de algoritmos de clustering, medidas de distância, janelas temporais de comparação, medidas de índice interno e protótipos de clustering. O desempenho das metodologias de clustering foi avaliado em termos de medidas de índice interno e também através da caracterização dos centroides dos clusters. As metodologias com melhor desempenho foram o Algoritmo de Partição com distância DTW, protótipo PAM e janela de temporal de 15 minutos e o Algoritmo de Partição com distância GAK, protótipo PAM e janela de temporal de 15 minutos, pois permitiram a formação três clusters. O primeiro método identifica um padrão de consumo noturno, um padrão típico de fim-de-semana e um padrão típico de dia útil, enquanto o segundo método destaca-se por apresentar um padrão com pequena variabilidade entre o consumo noturno e diurno. Para melhorar a extração de conhecimento, operações adicionais de clustering foram realizadas ao conjunto de dados que pertence ao cluster com pequena variabilidade entre consumo noturno e diurno. Novos clusters foram identificados e caracterizados, mostrando que os padrões associados à irrigação são independentes do período do dia e da época do ano, o que indica um uso ineficiente da água

    Learning in Dynamic Data-Streams with a Scarcity of Labels

    Get PDF
    Analysing data in real-time is a natural and necessary progression from traditional data mining. However, real-time analysis presents additional challenges to batch-analysis; along with strict time and memory constraints, change is a major consideration. In a dynamic stream there is an assumption that the underlying process generating the stream is non-stationary and that concepts within the stream will drift and change over time. Adopting a false assumption that a stream is stationary will result in non-adaptive models degrading and eventually becoming obsolete. The challenge of recognising and reacting to change in a stream is compounded by the scarcity of labels problem. This refers to the very realistic situation in which the true class label of an incoming point is not immediately available (or will never be available) or in situations where manually labelling incoming points is prohibitively expensive. The goal of this thesis is to evaluate unsupervised learning as the basis for online classification in dynamic data-streams with a scarcity of labels. To realise this goal, a novel stream clustering algorithm based on the collective behaviour of ants (Ant Colony Stream Clustering (ACSC)) is proposed. This algorithm is shown to be faster and more accurate than comparative, peer stream-clustering algorithms while requiring fewer sensitive parameters. The principles of ACSC are extended in a second stream-clustering algorithm named Multi-Density Stream Clustering (MDSC). This algorithm has adaptive parameters and crucially, can track clusters and monitor their dynamic behaviour over time. A novel technique called a Dynamic Feature Mask (DFM) is proposed to ``sit on top’’ of these stream-clustering algorithms and can be used to observe and track change at the feature level in a data stream. This Feature Mask acts as an unsupervised feature selection method allowing high-dimensional streams to be clustered. Finally, data-stream clustering is evaluated as an approach to one-class classification and a novel framework (named COCEL: Clustering and One class Classification Ensemble Learning) for classification in dynamic streams with a scarcity of labels is described. The proposed framework can identify and react to change in a stream and hugely reduces the number of required labels (typically less than 0.05% of the entire stream)

    Smart meter based profiling for load forecasting and demand side management in smart grids

    Get PDF
    The smart grid incorporates an integrated system of smart meters and communication networks that enable two-way communication between utilities and consumers. The granular information from smart meters can be used to improve the load forecast and influence consumer’s energy consumption patterns through demand side management (DSM). However, for localized studies of power system, using a large quantity of smart meter data having high level of noise preclude the use of computationally intensive techniques. Reduction of smart meter data to extract the load profiles and smoother load profiles at lower aggregation level (individual consumer or small groups of consumers) are highly desirable for use in linear techniques for power system studies. Therefore, this thesis addresses the challenges of smart meter data size, complexity, variability and volatility for efficient use in load forecasting and DSM. This thesis presents a novel clustering-based approach for analysis of smart meter data, aimed at more accurate and detailed load profiling, reduced profile complexity and improved load forecast accuracy and DSM solutions. The approach uses an innovative clustering algorithm to reduce the data size by proposing new cluster validity indices. The extremely volatile profiles having high levels of noise and complexity are linearized using Taylor series linearization process to alleviate the non-linearity and complexity of profiles. Finally, particle swarm optimization is applied for energy optimization in linearized profiles. The approach is demonstrated on Irish smart meter dataset and simulated PV data, to achieve improved load forecast accuracy using artificial neural network and improved DSM solutions using linear optimization with reduced computational burden. Investigations suggest that proposed clustering algorithm can produce clusters with high intra-cluster pattern similarity as a result of the introduction of new stopping criteria specifically tailored for load forecasting applications. A comparison between the proposed alternative profiles and raw profiles further suggests that the alternative profiles guide the underlying energy consumption with reduced complexity making them computationally efficient. Use of the alternative profiles suggests that the load forecasting accuracy can potentially be higher compared to raw profiles. The alternative profiles in combination with the novel cluster selection approach provide higher peak reduction by shifting the load from peak hours to off-peak hours and higher monetary benefits for the participating consumers. The proposed clustering algorithm and the alternative profiles represent an advancement of the conventional load profiling approach, benefiting system operators through more accurate forecasting and efficient DSM. The novel mathematical framework suggested in this thesis provides an advancement to the new knowledge in the area of smart metering and smart power grids

    Mining Extremes through Fuzzy Clustering

    Get PDF
    Archetypes are extreme points that synthesize data representing "pure" individual types. Archetypes are assigned by the most discriminating features of data points, and are almost always useful in applications when one is interested in extremes and not on commonalities. Recent applications include talent analysis in sports and science, fraud detection, profiling of users and products in recommendation systems, climate extremes, as well as other machine learning applications. The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose distinct models to find clusters with extreme prototypes. Even though the FCPM model does not impose its prototypes to lie in the convex hull of data, it belongs to the framework of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes. The comparative study between FS-AA and FCPM algorithms conducted in this dissertation covers the following aspects. First, the analysis of FS-AA on data recovery from clustering using a collection of 100 data sets of diverse dimensionalities, generated with a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour of FCPM-0 on removing the proper number of prototypes from data. Third, a collection of five popular fuzzy validation indices are explored on accessing the quality of clustering results. Forth, the algorithms undergo a study to evaluate how different initializations affect their convergence as well as the quality of the clustering partitions. The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results, which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and FCPM support the easy interpretation of the clustering results

    Relational clustering models for knowledge discovery and recommender systems

    Get PDF
    Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as to reflect the natural hidden data structure. Various heuristic or statistical approaches have been developed for analyzing propositional datasets. Nevertheless, in relational clustering the existence of multi-type relationships will greatly degrade the performance of traditional clustering algorithms. This issue motivates us to find more effective algorithms to conduct the cluster analysis upon relational datasets. In this thesis we comprehensively study the idea of Representative Objects for approximating data distribution and then design a multi-phase clustering framework for analyzing relational datasets with high effectiveness and efficiency. The second task considered in this thesis is to provide some better data models for people as well as machines to browse and navigate a dataset. The hierarchical taxonomy is widely used for this purpose. Compared with manually created taxonomies, automatically derived ones are more appealing because of their low creation/maintenance cost and high scalability. Up to now, the taxonomy generation techniques are mainly used to organize document corpus. We investigate the possibility of utilizing them upon relational datasets and then propose some algorithmic improvements. Another non-trivial problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize the content of each node. Unfortunately, this field has not been investigated sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing some novel approaches. The final goal of our cluster analysis and taxonomy generation techniques is to improve the scalability of recommender systems that are developed to tackle the problem of information overload. Recent research in recommender systems integrates the exploitation of domain knowledge to improve the recommendation quality, which however reduces the scalability of the whole system at the same time. We address this issue by applying the automatically derived taxonomy to preserve the pair-wise similarities between items, and then modeling the user visits by another hierarchical structure. Experimental results show that the computational complexity of the recommendation procedure can be greatly reduced and thus the system scalability be improved
    • …
    corecore