18 research outputs found

    Adaptive subspace sampling for class imbalance processing

    Full text link
    © 2016 IEEE. This paper presents a novel oversampling technique that addresses highly imbalanced data distribution. At present, the imbalanced data that have anomalous class distribution and underrepresented data are difficult to deal with through a variety of conventional machine learning technologies. In order to balance class distributions, an adaptive subspace self-organizing map (ASSOM) that combines the local mapping scheme and globally competitive rule is proposed to artificially generate synthetic samples focusing on minority class samples. The ASSOM is conformed with feature-invariant characteristics, including translation, scaling and rotation, and it retains the independence of basis vectors in each module. Specifically, basis vectors generated via each ASSOM module can avoid generating repeated representative features that offer nothing but heavy computational load. Several experimental results demonstrate that the proposed ASSOM method with supervised learning manner is superior to other existing oversampling techniques

    Event-based feature extraction using adaptive selection thresholds

    Get PDF
    Unsupervised feature extraction algorithms form one of the most important building blocks in machine learning systems. These algorithms are often adapted to the event-based domain to perform online learning in neuromorphic hardware. However, not designed for the purpose, such algorithms typically require significant simplification during implementation to meet hardware constraints, creating trade offs with performance. Furthermore, conventional feature extraction algorithms are not designed to generate useful intermediary signals which are valuable only in the context of neuromorphic hardware limitations. In this work a novel event-based feature extraction method is proposed that focuses on these issues. The algorithm operates via simple adaptive selection thresholds which allow a simpler implementation of network homeostasis than previous works by trading off a small amount of information loss in the form of missed events that fall outside the selection thresholds. The behavior of the selection thresholds and the output of the network as a whole are shown to provide uniquely useful signals indicating network weight convergence without the need to access network weights. A novel heuristic method for network size selection is proposed which makes use of noise events and their feature representations. The use of selection thresholds is shown to produce network activation patterns that predict classification accuracy allowing rapid evaluation and optimization of system parameters without the need to run back-end classifiers. The feature extraction method is tested on both the N-MNIST (Neuromorphic-MNIST) benchmarking dataset and a dataset of airplanes passing through the field of view. Multiple configurations with different classifiers are tested with the results quantifying the resultant performance gains at each processing stage

    Nonsmooth optimization models and algorithms for data clustering and visualization

    Get PDF
    Cluster analysis deals with the problem of organization of a collection of patterns into clusters based on a similarity measure. Various distance functions can be used to define this measure. Clustering problems with the similarity measure defined by the squared Euclidean distance have been studied extensively over the last five decades. However, problems with other Minkowski norms have attracted significantly less attention. The use of different similarity measures may help to identify different cluster structures of a data set. This in turn may help to significantly improve the decision making process. High dimensional data visualization is another important task in the field of data mining and pattern recognition. To date, the principal component analysis and the self-organizing maps techniques have been used to solve such problems. In this thesis we develop algorithms for solving clustering problems in large data sets using various similarity measures. Such similarity measures are based on the squared LDoctor of Philosoph

    Adaptive combinations of classifiers with application to on-line handwritten character recognition

    Get PDF
    Classifier combining is an effective way of improving classification performance. User adaptation is clearly another valid approach for improving performance in a user-dependent system, and even though adaptation is usually performed on the classifier level, also adaptive committees can be very effective. Adaptive committees have the distinct ability of performing adaptation without detailed knowledge of the classifiers. Adaptation can therefore be used even with classification systems that intrinsically are not suited for adaptation, whether that be due to lack of access to the workings of the classifier or simply a classification scheme not suitable for continuous learning. This thesis proposes methods for adaptive combination of classifiers in the setting of on-line handwritten character recognition. The focal part of the work introduces adaptive classifier combination schemes, of which the two most prominent ones are the Dynamically Expanding Context (DEC) committee and the Class-Confidence Critic Combining (CCCC) committee. Both have been shown to be capable of successful adaptation to the user in the task of on-line handwritten character recognition. Particularly the highly modular CCCC framework has shown impressive performance also in a doubly-adaptive setting of combining adaptive classifiers by using an adaptive committee. In support of this main topic of the thesis, some discussion on a methodology for deducing correct character labeling from user actions is presented. Proper labeling is paramount for effective adaptation, and deducing the labels from the user's actions is necessary to perform adaptation transparently to the user. In that way, the user does not need to give explicit feedback on the correctness of the recognition results. Also, an overview is presented of adaptive classification methods for single-classifier adaptation in handwritten character recognition developed at the Laboratory of Computer and Information Science of the Helsinki University of Technology, CIS-HCR. Classifiers based on the CIS-HCR system have been used in the adaptive committee experiments as both member classifiers and to provide a reference level. Finally, two distinct approaches for improving the performance of committee classifiers further are discussed. Firstly, methods for committee rejection are presented and evaluated. Secondly, measures of classifier diversity for classifier selection, based on the concept of diversity of errors, are presented and evaluated. The topic of this thesis hence covers three important aspects of pattern recognition: on-line adaptation, combining classifiers, and a practical evaluation setting of handwritten character recognition. A novel approach combining these three core ideas has been developed and is presented in the introductory text and the included publications. To reiterate, the main contributions of this thesis are: 1) introduction of novel adaptive committee classification methods, 2) introduction of novel methods for measuring classifier diversity, 3) presentation of some methods for implementing committee rejection, 4) discussion and introduction of a method for effective label deduction from on-line user actions, and as a side-product, 5) an overview of the CIS-HCR adaptive on-line handwritten character recognition system.Luokittimien yhdistäminen komitealuokittimella on tehokas keino luokitustarkkuuden parantamiseen. Laskentatehon jatkuva kasvu tekee myös useiden luokittimien yhtäaikaisesta käytöstä yhä varteenotettavamman vaihtoehdon. Järjestelmän adaptoituminen (mukautuminen) käyttäjään on toinen hyvä keino käyttäjäriippumattoman järjestelmän tarkkuuden parantantamiseksi. Vaikka adaptaatio yleensä toteutetaan luokittimen tasolla, myös adaptiiviset komitealuokittimet voivat olla hyvin tehokkaita. Adaptiiviset komiteat voivat adaptoitua ilman yksityiskohtaista tietoa jäsenluokittimista. Adaptaatiota voidaan näin käyttää myös luokittelujärjestelmissä, jotka eivät ole itsessään sopivia adaptaatioon. Adaptaatioon sopimattomuus voi johtua esimerkiksi siitä, että luokittimen totetutusta ei voida muuttaa, tai siitä, että käytetään luokittelumenetelmää, joka ei sovellu jatkuvaan oppimiseen. Tämä väitöskirja käsittelee menetelmiä luokittimien adaptiiviseen yhdistämiseen käyttäen sovelluskohteena käsinkirjoitettujen merkkien on-line-tunnistusta. Keskeisin osa työtä esittelee uusia adaptiivisia luokittimien yhdistämismenetelmiä, joista kaksi huomattavinta ovat Dynamically Expanding Context (DEC) -komitea sekä Class-Confidence Critic Combining (CCCC) -komitea. Molemmat näistä ovat osoittautuneet kykeneviksi tehokkaaseen käyttäjä-adaptaatioon käsinkirjoitettujen merkkien on-line-tunnistuksessa. Erityisesti hyvin modulaarisella CCCC järjestelmällä on saatu hyviä tuloksia myös kaksinkertaisesti adaptiivisessa asetelmassa, jossa yhdistetään adaptiivisia jäsenluokittimia adaptiivisen komitean avulla. Väitöskirjan pääteeman tukena esitetään myös malli ja käytännön esimerkki siitä, miten käyttäjän toimista merkeille voidaan päätellä oikeat luokat. Merkkien todellisen luokan onnistunut päättely on elintärkeää tehokkaalle adaptaatiolle. Jotta adaptaatio voitaisiin suorittaa käyttäjälle läpinäkyvästi, merkkien todelliset luokat on kyettävä päättelemään käyttäjän toimista. Tällä tavalla käyttäjän ei tarvitse antaa suoraa palautetta tunnistustuloksen oikeellisuudesta. Työssä esitetään myös yleiskatsaus Teknillisen korkeakoulun Informaatiotekniikan laboratoriossa kehitettyyn adaptiiviseen käsinkirjoitettujen merkkien tunnistusjärjestelmään. Tähän järjestelmään perustuvia luokittimia on käytetty adaptiivisten komitealuokittimien kokeissa sekä jäsenluokittimina että vertailutasona. Lopuksi esitellään kaksi erillistä menetelmää komitealuokittimen tarkkuuden edelleen parantamiseksi. Näistä ensimmäinen on joukko menetelmiä komitealuokittimen rejektion (hylkäyksen) toteuttamiseksi. Toinen esiteltävä menetelmä on käyttää luokittimien erilaisuuden mittoja jäsenluokittimien valintaa varten. Ehdotetut uudet erilaisuusmitat perustuvat käsitteeseen, jota kutsumme virheiden erilaisuudeksi. Väitöskirjan aihe kattaa kolme hahmontunnistuksen tärkeää osa-aluetta: online-adaptaation, luokittimien yhdistämisen ja käytännön sovellusalana käsinkirjoitettujen merkkien tunnistuksen. Näistä kolmesta lähtökohdasta on kehitetty uudenlainen synteesi, joka esitetään johdantotekstissä sekä liitteenä olevissa julkaisuissa. Tämän väitöskirjan oleellisimmat kontribuutiot ovat siten: 1) uusien adaptiivisten komitealuokittimien esittely, 2) uudenlaisten menetelmien esittely luokittimien erilaisuuden mittaamiseksi, 3) joidenkin komitearejektiomenetelmien esittely, 4) pohdinnan ja erään toteutustavan esittely syötettyjen merkkien todellisen luokan päättelemiseksi käyttäjän toimista, sekä sivutuotteena 5) kattava yleiskatsaus CIS-HCR adaptiiviseen on-line käsinkirjoitettujen merkkien tunnistusjärjestelmään.reviewe

    Data mining using neural networks

    Get PDF
    Data mining is about the search for relationships and global patterns in large databases that are increasing in size. Data mining is beneficial for anyone who has a huge amount of data, for example, customer and business data, transaction, marketing, financial, manufacturing and web data etc. The results of data mining are also referred to as knowledge in the form of rules, regularities and constraints. Rule mining is one of the popular data mining methods since rules provide concise statements of potentially important information that is easily understood by end users and also actionable patterns. At present rule mining has received a good deal of attention and enthusiasm from data mining researchers since rule mining is capable of solving many data mining problems such as classification, association, customer profiling, summarization, segmentation and many others. This thesis makes several contributions by proposing rule mining methods using genetic algorithms and neural networks. The thesis first proposes rule mining methods using a genetic algorithm. These methods are based on an integrated framework but capable of mining three major classes of rules. Moreover, the rule mining processes in these methods are controlled by tuning of two data mining measures such as support and confidence. The thesis shows how to build data mining predictive models using the resultant rules of the proposed methods. Another key contribution of the thesis is the proposal of rule mining methods using supervised neural networks. The thesis mathematically analyses the Widrow-Hoff learning algorithm of a single-layered neural network, which results in a foundation for rule mining algorithms using single-layered neural networks. Three rule mining algorithms using single-layered neural networks are proposed for the three major classes of rules on the basis of the proposed theorems. The thesis also looks at the problem of rule mining where user guidance is absent. The thesis proposes a guided rule mining system to overcome this problem. The thesis extends this work further by comparing the performance of the algorithm used in the proposed guided rule mining system with Apriori data mining algorithm. Finally, the thesis studies the Kohonen self-organization map as an unsupervised neural network for rule mining algorithms. Two approaches are adopted based on the way of self-organization maps applied in rule mining models. In the first approach, self-organization map is used for clustering, which provides class information to the rule mining process. In the second approach, automated rule mining takes the place of trained neurons as it grows in a hierarchical structure

    Novel neural approaches to data topology analysis and telemedicine

    Get PDF
    1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen676. INGEGNERIA ELETTRICAnoopenRandazzo, Vincenz

    Deep Clustering and Deep Network Compression

    Get PDF
    The use of deep learning has grown increasingly in recent years, thereby becoming a much-discussed topic across a diverse range of fields, especially in computer vision, text mining, and speech recognition. Deep learning methods have proven to be robust in representation learning and attained extraordinary achievement. Their success is primarily due to the ability of deep learning to discover and automatically learn feature representations by mapping input data into abstract and composite representations in a latent space. Deep learning’s ability to deal with high-level representations from data has inspired us to make use of learned representations, aiming to enhance unsupervised clustering and evaluate the characteristic strength of internal representations to compress and accelerate deep neural networks.Traditional clustering algorithms attain a limited performance as the dimensionality in-creases. Therefore, the ability to extract high-level representations provides beneficial components that can support such clustering algorithms. In this work, we first present DeepCluster, a clustering approach embedded in a deep convolutional auto-encoder. We introduce two clustering methods, namely DCAE-Kmeans and DCAE-GMM. The DeepCluster allows for data points to be grouped into their identical cluster, in the latent space, in a joint-cost function by simultaneously optimizing the clustering objective and the DCAE objective, producing stable representations, which is appropriate for the clustering process. Both qualitative and quantitative evaluations of proposed methods are reported, showing the efficiency of deep clustering on several public datasets in comparison to the previous state-of-the-art methods.Following this, we propose a new version of the DeepCluster model to include varying degrees of discriminative power. This introduces a mechanism which enables the imposition of regularization techniques and the involvement of a supervision component. The key idea of our approach is to distinguish the discriminatory power of numerous structures when searching for a compact structure to form robust clusters. The effectiveness of injecting various levels of discriminatory powers into the learning process is investigated alongside the exploration and analytical study of the discriminatory power obtained through the use of two discriminative attributes: data-driven discriminative attributes with the support of regularization techniques, and supervision discriminative attributes with the support of the supervision component. An evaluation is provided on four different datasets.The use of neural networks in various applications is accompanied by a dramatic increase in computational costs and memory requirements. Making use of the characteristic strength of learned representations, we propose an iterative pruning method that simultaneously identifies the critical neurons and prunes the model during training without involving any pre-training or fine-tuning procedures. We introduce a majority voting technique to compare the activation values among neurons and assign a voting score to evaluate their importance quantitatively. This mechanism effectively reduces model complexity by eliminating the less influential neurons and aims to determine a subset of the whole model that can represent the reference model with much fewer parameters within the training process. Empirically, we demonstrate that our pruning method is robust across various scenarios, including fully-connected networks (FCNs), sparsely-connected networks (SCNs), and Convolutional neural networks (CNNs), using two public datasets.Moreover, we also propose a novel framework to measure the importance of individual hidden units by computing a measure of relevance to identify the most critical filters and prune them to compress and accelerate CNNs. Unlike existing methods, we introduce the use of the activation of feature maps to detect valuable information and the essential semantic parts, with the aim of evaluating the importance of feature maps, inspired by novel neural network interpretability. A majority voting technique based on the degree of alignment between a se-mantic concept and individual hidden unit representations is utilized to evaluate feature maps’ importance quantitatively. We also propose a simple yet effective method to estimate new convolution kernels based on the remaining crucial channels to accomplish effective CNN compression. Experimental results show the effectiveness of our filter selection criteria, which outperforms the state-of-the-art baselines.To conclude, we present a comprehensive, detailed review of time-series data analysis, with emphasis on deep time-series clustering (DTSC), and a founding contribution to the area of applying deep clustering to time-series data by presenting the first case study in the context of movement behavior clustering utilizing the DeepCluster method. The results are promising, showing that the latent space encodes sufficient patterns to facilitate accurate clustering of movement behaviors. Finally, we identify state-of-the-art and present an outlook on this important field of DTSC from five important perspectives

    Improving sampling, optimization and feature extraction in Boltzmann machines

    Full text link
    L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.Despite the current widescale success of deep learning in training large scale hierarchical models through supervised learning, unsupervised learning promises to play a crucial role towards solving general Artificial Intelligence, where agents are expected to learn with little to no supervision. The work presented in this thesis tackles the problem of unsupervised feature learning and density estimation, using a model family at the heart of the deep learning phenomenon: the Boltzmann Machine (BM). We present contributions in the areas of sampling, partition function estimation, optimization and the more general topic of invariant feature learning. With regards to sampling, we present a novel adaptive parallel tempering method which dynamically adjusts the temperatures under simulation to maintain good mixing in the presence of complex multi-modal distributions. When used in the context of stochastic maximum likelihood (SML) training, the improved ergodicity of our sampler translates to increased robustness to learning rates and faster per epoch convergence. Though our application is limited to BM, our method is general and is applicable to sampling from arbitrary probabilistic models using Markov Chain Monte Carlo (MCMC) techniques. While SML gradients can be estimated via sampling, computing data likelihoods requires an estimate of the partition function. Contrary to previous approaches which consider the model as a black box, we provide an efficient algorithm which instead tracks the change in the log partition function incurred by successive parameter updates. Our algorithm frames this estimation problem as one of filtering performed over a 2D lattice, with one dimension representing time and the other temperature. On the topic of optimization, our thesis presents a novel algorithm for applying the natural gradient to large scale Boltzmann Machines. Up until now, its application had been constrained by the computational and memory requirements of computing the Fisher Information Matrix (FIM), which is square in the number of parameters. The Metric-Free Natural Gradient algorithm (MFNG) avoids computing the FIM altogether by combining a linear solver with an efficient matrix-vector operation. The method shows promise in that the resulting updates yield faster per-epoch convergence, despite being slower in terms of wall clock time. Finally, we explore how invariant features can be learnt through modifications to the BM energy function. We study the problem in the context of the spike & slab Restricted Boltzmann Machine (ssRBM), which we extend to handle both binary and sparse input distributions. By associating each spike with several slab variables, latent variables can be made invariant to a rich, high dimensional subspace resulting in increased invariance in the learnt representation. When using the expected model posterior as input to a classifier, increased invariance translates to improved classification accuracy in the low-label data regime. We conclude by showing a connection between invariance and the more powerful concept of disentangling factors of variation. While invariance can be achieved by pooling over subspaces, disentangling can be achieved by learning multiple complementary views of the same subspace. In particular, we show how this can be achieved using third-order BMs featuring multiplicative interactions between pairs of random variables
    corecore