191 research outputs found

    A survey of kernel and spectral methods for clustering

    Get PDF
    Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

    Clustering of multiple instance data.

    Get PDF
    An emergent area of research in machine learning that aims to develop tools to analyze data where objects have multiple representations is Multiple Instance Learning (MIL). In MIL, each object is represented by a bag that includes a collection of feature vectors called instances. A bag is positive if it contains at least one positive instance, and negative if no instances are positive. One of the main objectives in MIL is to identify a region in the instance feature space with high correlation to instances from positive bags and low correlation to instances from negative bags -- this region is referred to as a target concept (TC). Existing methods either only identify a single target concept, do not provide a mechanism for selecting the appropriate number of target concepts, or do not provide a flexible representation for target concept memberships. Thus, they are not suitable to handle data with large intra-class variation. In this dissertation we propose new algorithms that learn multiple target concepts simultaneously. The proposed algorithms combine concepts from data clustering and multiple instance learning. In particular, we propose crisp, fuzzy, and possibilistic variations of the Multi-target concept Diverse Density (MDD) metric, along with three algorithms to optimize them. Each algorithm relies on an alternating optimization strategy that iteratively refines concept assignments, locations, and scales until it converges to an optimal set of target concepts. We also demonstrate how the possibilistic MDD metric can be used to select the appropriate number of target concepts for a dataset. Lastly, we propose the construction of classifiers based on embedded feature space theory to use our target concepts to predict the label of prospective MIL data. The proposed algorithms are implemented, tested, and validated through the analysis of multiple synthetic and real-world data. We first demonstrate that our algorithms can detect multiple target concepts reliably, and are robust to many generative data parameters. We then demonstrate how our approach can be used in the application of Buried Explosive Object (BEO) detection to locate distinct target concepts corresponding to signatures of varying BEO types. We also demonstrate that our classifier strategies can perform competitively with other well-established embedded space approaches in classification of Benchmark MIL data

    Explainable parts-based concept modeling and reasoning

    Get PDF
    State-of-the-art artificial intelligence (AI) learning algorithms heavily rely on deep learning methods that exploit correlation between inputs and outputs. While effective, these methods typically provide little insight to the reasoning process used by the machine, which makes it difficult for human users to understand the process, trust the decisions made by the system, and control emergent behaviors in the system. One method to fix this is eXplainable AI (XAI), which aims to create algorithms that perform well while also providing explanations to users about the reasoning process to mitigate the problems outlined above. In this thesis, I focus on advancing the research around XAI techniques by introducing systems that provide explanations through the use of partsbased concept modeling and reasoning. Instead of correlating input to output, I correlate input to sub-parts or features of the overall concept being learned by the system. These features are used to model and reason about a concept using an explicitly defined structure. These structures provide explanations to the user by nature of how they are defined. Specifically, I introduce a shallow and deep Adaptive Neuro-Fuzzy Inference System (ANFIS) that can reason in noisy and uncertain contexts. ANFIS provides explanations in the form of learned rules that combine features to determine the overall output concept. I apply this system to real geospatial parts-based reasoning problems and evaluate the performance and explainability of the algorithm. I discover some drawbacks to the ANFIS system as traditionally defined due to dead and diminishing gradients. This leads me to focus on how to model parts-based concepts and their inherent uncertainty in other ways, namely through Spatially Attributed Relation Graphs (SARGs). I incorporate human feedback to refine the machine learning of concepts using SARGs. Finally, I present future directions for research to build on the progress presented in this thesis.Includes bibliographical references

    A generic framework for context-dependent fusion with application to landmine detection.

    Get PDF
    For complex detection and classification problems, involving data with large intra-class variations and noisy inputs, no single source of information can provide a satisfactory solution. As a result, combination of multiple classifiers is playing an increasing role in solving these complex pattern recognition problems, and has proven to be a viable alternative to using a single classifier. Over the past few years, a variety of schemes have been proposed for combining multiple classifiers. Most of these were global as they assign a degree of worthiness to each classifier, that is averaged over the entire training data. This may not be the optimal way to combine the different experts since the behavior of each one may not be uniform over the different regions of the feature space. To overcome this issue, few local methods have been proposed in the last few years. Local fusion methods aim to adapt the classifiers\u27 worthiness to different regions of the feature space. First, they partition the input samples. Then, they identify the best classifier for each partition and designate it as the expert for that partition. Unfortunately, current local methods are either computationally expensive and/or perform these two tasks independently of each other. However, feature space partition and algorithm selection are not independent and their optimization should be simultaneous. In this dissertation, we introduce a new local fusion approach, called Context Extraction for Local Fusion (CELF). CELF was designed to adapt the fusion to different regions of the feature space. It takes advantage of the strength of the different experts and overcome their limitations. First, we describe the baseline CELF algorithm. We formulate a novel objective function that combines context identification and multi-algorithm fusion criteria into a joint objective function. The context identification component thrives to partition the input feature space into different clusters (called contexts), while the fusion component thrives to learn the optimal fusion parameters within each cluster. Second, we propose several variations of CELF to deal with different applications scenario. In particular, we propose an extension that includes a feature discrimination component (CELF-FD). This version is advantageous when dealing with high dimensional feature spaces and/or when the number of features extracted by the individual algorithms varies significantly. CELF-CA is another extension of CELF that adds a regularization term to the objective function to introduce competition among the clusters and to find the optimal number of clusters in an unsupervised way. CELF-CA starts by partitioning the data into a large number of small clusters. As the algorithm progresses, adjacent clusters compete for data points, and clusters that lose the competition gradually become depleted and vanish. Third, we propose CELF-M that generalizes CELF to support multiple classes data sets. The baseline CELF and its extensions were formulated to use linear aggregation to combine the output of the different algorithms within each context. For some applications, this can be too restrictive and non-linear fusion may be needed. To address this potential drawback, we propose two other variations of CELF that use non-linear aggregation. The first one is based on Neural Networks (CELF-NN) and the second one is based on Fuzzy Integrals (CELF-FI). The latter one has the desirable property of assigning weights to subsets of classifiers to take into account the interaction between them. To test a new signature using CELF (or its variants), each algorithm would extract its set of features and assigns a confidence value. Then, the features are used to identify the best context, and the fusion parameters of this context are used to fuse the individual confidence values. For each variation of CELF, we formulate an objective function, derive the necessary conditions to optimize it, and construct an iterative algorithm. Then we use examples to illustrate the behavior of the algorithm, compare it to global fusion, and highlight its advantages. We apply our proposed fusion methods to the problem of landmine detection. We use data collected using Ground Penetration Radar (GPR) and Wideband Electro -Magnetic Induction (WEMI) sensors. We show that CELF (and its variants) can identify meaningful and coherent contexts (e.g. mines of same type, mines buried at the same site, etc.) and that different expert algorithms can be identified for the different contexts. In addition to the land mine detection application, we apply our approaches to semantic video indexing, image database categorization, and phoneme recognition. In all applications, we compare the performance of CELF with standard fusion methods, and show that our approach outperforms all these methods

    Proceedings of the Third International Workshop on Neural Networks and Fuzzy Logic, volume 2

    Get PDF
    Papers presented at the Neural Networks and Fuzzy Logic Workshop sponsored by the National Aeronautics and Space Administration and cosponsored by the University of Houston, Clear Lake, held 1-3 Jun. 1992 at the Lyndon B. Johnson Space Center in Houston, Texas are included. During the three days approximately 50 papers were presented. Technical topics addressed included adaptive systems; learning algorithms; network architectures; vision; robotics; neurobiological connections; speech recognition and synthesis; fuzzy set theory and application, control and dynamics processing; space applications; fuzzy logic and neural network computers; approximate reasoning; and multiobject decision making

    Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users

    Full text link
    [EN] The electricity sector is currently undergoing a process of liberalization and separation of roles, which is being implemented under the regulatory auspices of each Member State of the European Union and, therefore, with different speeds, perspectives and objectives that must converge on a common horizon, where Europe will benefit from an interconnected energy market in which producers and consumers can participate in free competition. This process of liberalization and separation of roles involves two consequences or, viewed another way, entails a major consequence from which other immediate consequence, as a necessity, is derived. The main consequence is the increased complexity in the management and supervision of a system, the electrical, increasingly interconnected and participatory, with connection of distributed energy sources, much of them from renewable sources, at different voltage levels and with different generation capacity at any point in the network. From this situation the other consequence is derived, which is the need to communicate information between agents, reliably, safely and quickly, and that this information is analyzed in the most effective way possible, to form part of the processes of decision taking that improve the observability and controllability of a system which is increasing in complexity and number of agents involved. With the evolution of Information and Communication Technologies (ICT), and the investments both in improving existing measurement and communications infrastructure, and taking the measurement and actuation capacity to a greater number of points in medium and low voltage networks, the availability of data that informs of the state of the network is increasingly higher and more complete. All these systems are part of the so-called Smart Grids, or intelligent networks of the future, a future which is not so far. One such source of information comes from the energy consumption of customers, measured on a regular basis (every hour, half hour or quarter-hour) and sent to the Distribution System Operators from the Smart Meters making use of Advanced Metering Infrastructure (AMI). This way, there is an increasingly amount of information on the energy consumption of customers, being stored in Big Data systems. This growing source of information demands specialized techniques which can take benefit from it, extracting a useful and summarized knowledge from it. This thesis deals with the use of this information of energy consumption from Smart Meters, in particular on the application of data mining techniques to obtain temporal patterns that characterize the users of electrical energy, grouping them according to these patterns in a small number of groups or clusters, that allow evaluating how users consume energy, both during the day and during a sequence of days, allowing to assess trends and predict future scenarios. For this, the current techniques are studied and, proving that the current works do not cover this objective, clustering or dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users are developed. These techniques are tested and validated on a database of hourly energy consumption values for a sample of residential customers in Spain during years 2008 and 2009. The results allow to observe both the characterization in consumption patterns of the different types of residential energy consumers, and their evolution over time, and to assess, for example, how the regulatory changes that occurred in Spain in the electricity sector during those years influenced in the temporal patterns of energy consumption.[ES] El sector eléctrico se halla actualmente sometido a un proceso de liberalización y separación de roles, que está siendo aplicado bajo los auspicios regulatorios de cada Estado Miembro de la Unión Europea y, por tanto, con distintas velocidades, perspectivas y objetivos que deben confluir en un horizonte común, en donde Europa se beneficiará de un mercado energético interconectado, en el cual productores y consumidores podrán participar en libre competencia. Este proceso de liberalización y separación de roles conlleva dos consecuencias o, visto de otra manera, conlleva una consecuencia principal de la cual se deriva, como necesidad, otra consecuencia inmediata. La consecuencia principal es el aumento de la complejidad en la gestión y supervisión de un sistema, el eléctrico, cada vez más interconectado y participativo, con conexión de fuentes distribuidas de energía, muchas de ellas de origen renovable, a distintos niveles de tensión y con distinta capacidad de generación, en cualquier punto de la red. De esta situación se deriva la otra consecuencia, que es la necesidad de comunicar información entre los distintos agentes, de forma fiable, segura y rápida, y que esta información sea analizada de la forma más eficaz posible, para que forme parte de los procesos de toma de decisiones que mejoran la observabilidad y controlabilidad de un sistema cada vez más complejo y con más agentes involucrados. Con el avance de las Tecnologías de Información y Comunicaciones (TIC), y las inversiones tanto en mejora de la infraestructura existente de medida y comunicaciones, como en llevar la obtención de medidas y la capacidad de actuación a un mayor número de puntos en redes de media y baja tensión, la disponibilidad de datos sobre el estado de la red es cada vez mayor y más completa. Todos estos sistemas forman parte de las llamadas Smart Grids, o redes inteligentes del futuro, un futuro ya no tan lejano. Una de estas fuentes de información proviene de los consumos energéticos de los clientes, medidos de forma periódica (cada hora, media hora o cuarto de hora) y enviados hacia las Distribuidoras desde los contadores inteligentes o Smart Meters, mediante infraestructura avanzada de medida o Advanced Metering Infrastructure (AMI). De esta forma, cada vez se tiene una mayor cantidad de información sobre los consumos energéticos de los clientes, almacenada en sistemas de Big Data. Esta cada vez mayor fuente de información demanda técnicas especializadas que sepan aprovecharla, extrayendo un conocimiento útil y resumido de la misma. La presente Tesis doctoral versa sobre el uso de esta información de consumos energéticos de los contadores inteligentes, en concreto sobre la aplicación de técnicas de minería de datos (data mining) para obtener patrones temporales que caractericen a los usuarios de energía eléctrica, agrupándolos según estos mismos patrones en un número reducido de grupos o clusters, que permiten evaluar la forma en que los usuarios consumen la energía, tanto a lo largo del día como durante una secuencia de días, permitiendo evaluar tendencias y predecir escenarios futuros. Para ello se estudian las técnicas actuales y, comprobando que los trabajos actuales no cubren este objetivo, se desarrollan técnicas de clustering o segmentación dinámica aplicadas a curvas de carga de consumo eléctrico diario de clientes domésticos. Estas técnicas se prueban y validan sobre una base de datos de consumos energéticos horarios de una muestra de clientes residenciales en España durante los años 2008 y 2009. Los resultados permiten observar tanto la caracterización en consumos de los distintos tipos de consumidores energéticos residenciales, como su evolución en el tiempo, y permiten evaluar, por ejemplo, cómo influenciaron en los patrones temporales de consumos los cambios regulatorios que se produjeron en España en el sector eléctrico durante esos años.[CA] El sector elèctric es troba actualment sotmès a un procés de liberalització i separació de rols, que s'està aplicant davall els auspicis reguladors de cada estat membre de la Unió Europea i, per tant, amb distintes velocitats, perspectives i objectius que han de confluir en un horitzó comú, on Europa es beneficiarà d'un mercat energètic interconnectat, en el qual productors i consumidors podran participar en lliure competència. Aquest procés de liberalització i separació de rols comporta dues conseqüències o, vist d'una altra manera, comporta una conseqüència principal de la qual es deriva, com a necessitat, una altra conseqüència immediata. La conseqüència principal és l'augment de la complexitat en la gestió i supervisió d'un sistema, l'elèctric, cada vegada més interconnectat i participatiu, amb connexió de fonts distribuïdes d'energia, moltes d'aquestes d'origen renovable, a distints nivells de tensió i amb distinta capacitat de generació, en qualsevol punt de la xarxa. D'aquesta situació es deriva l'altra conseqüència, que és la necessitat de comunicar informació entre els distints agents, de forma fiable, segura i ràpida, i que aquesta informació siga analitzada de la manera més eficaç possible, perquè forme part dels processos de presa de decisions que milloren l'observabilitat i controlabilitat d'un sistema cada vegada més complex i amb més agents involucrats. Amb l'avanç de les tecnologies de la informació i les comunicacions (TIC), i les inversions, tant en la millora de la infraestructura existent de mesura i comunicacions, com en el trasllat de l'obtenció de mesures i capacitat d'actuació a un nombre més gran de punts en xarxes de mitjana i baixa tensió, la disponibilitat de dades sobre l'estat de la xarxa és cada vegada major i més completa. Tots aquests sistemes formen part de les denominades Smart Grids o xarxes intel·ligents del futur, un futur ja no tan llunyà. Una d'aquestes fonts d'informació prové dels consums energètics dels clients, mesurats de forma periòdica (cada hora, mitja hora o quart d'hora) i enviats cap a les distribuïdores des dels comptadors intel·ligents o Smart Meters, per mitjà d'infraestructura avançada de mesura o Advanced Metering Infrastructure (AMI). D'aquesta manera, cada vegada es té una major quantitat d'informació sobre els consums energètics dels clients, emmagatzemada en sistemes de Big Data. Aquesta cada vegada major font d'informació demanda tècniques especialitzades que sàpiguen aprofitar-la, extraient-ne un coneixement útil i resumit. La present tesi doctoral versa sobre l'ús d'aquesta informació de consums energètics dels comptadors intel·ligents, en concret sobre l'aplicació de tècniques de mineria de dades (data mining) per a obtenir patrons temporals que caracteritzen els usuaris d'energia elèctrica, agrupant-los segons aquests mateixos patrons en una quantitat reduïda de grups o clusters, que permeten avaluar la forma en què els usuaris consumeixen l'energia, tant al llarg del dia com durant una seqüència de dies, i que permetent avaluar tendències i predir escenaris futurs. Amb aquesta finalitat, s'estudien les tècniques actuals i, en comprovar que els treballs actuals no cobreixen aquest objectiu, es desenvolupen tècniques de clustering o segmentació dinàmica aplicades a corbes de càrrega de consum elèctric diari de clients domèstics. Aquestes tècniques es proven i validen sobre una base de dades de consums energètics horaris d'una mostra de clients residencials a Espanya durant els anys 2008 i 2009. Els resultats permeten observar tant la caracterització en consums dels distints tipus de consumidors energètics residencials, com la seua evolució en el temps, i permeten avaluar, per exemple, com van influenciar en els patrons temporals de consums els canvis reguladors que es van produir a Espanya en el sector elèctric durant aquests anys.Benítez Sánchez, IJ. (2015). Dynamic segmentation techniques applied to load profiles of electric energy consumption from domestic users [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59236TESI

    Fuzzy Mathematics

    Get PDF
    This book provides a timely overview of topics in fuzzy mathematics. It lays the foundation for further research and applications in a broad range of areas. It contains break-through analysis on how results from the many variations and extensions of fuzzy set theory can be obtained from known results of traditional fuzzy set theory. The book contains not only theoretical results, but a wide range of applications in areas such as decision analysis, optimal allocation in possibilistics and mixed models, pattern classification, credibility measures, algorithms for modeling uncertain data, and numerical methods for solving fuzzy linear systems. The book offers an excellent reference for advanced undergraduate and graduate students in applied and theoretical fuzzy mathematics. Researchers and referees in fuzzy set theory will find the book to be of extreme value
    corecore