90 research outputs found

    A study on model selection of binary and non-Gaussian factor analysis.

    Get PDF
    An, Yujia.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 71-76).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.1.1 --- Review on BFA --- p.2Chapter 1.1.2 --- Review on NFA --- p.3Chapter 1.1.3 --- Typical model selection criteria --- p.5Chapter 1.1.4 --- New model selection criterion and automatic model selection --- p.6Chapter 1.2 --- Our contributions --- p.7Chapter 1.3 --- Thesis outline --- p.8Chapter 2 --- Combination of B and BI architectures for BFA with automatic model selection --- p.10Chapter 2.1 --- Implementation of BFA using BYY harmony learning with au- tomatic model selection --- p.11Chapter 2.1.1 --- Basic issues of BFA --- p.11Chapter 2.1.2 --- B-architecture for BFA with automatic model selection . --- p.12Chapter 2.1.3 --- BI-architecture for BFA with automatic model selection . --- p.14Chapter 2.2 --- Local minima in B-architecture and BI-architecture --- p.16Chapter 2.2.1 --- Local minima in B-architecture --- p.16Chapter 2.2.2 --- One unstable result in BI-architecture --- p.21Chapter 2.3 --- Combination of B- and BI-architecture for BFA with automatic model selection --- p.23Chapter 2.3.1 --- Combine B-architecture and BI-architecture --- p.23Chapter 2.3.2 --- Limitations of BI-architecture --- p.24Chapter 2.4 --- Experiments --- p.25Chapter 2.4.1 --- Frequency of local minima occurring in B-architecture --- p.25Chapter 2.4.2 --- Performance comparison for several methods in B-architecture --- p.26Chapter 2.4.3 --- Comparison of local minima in B-architecture and BI- architecture --- p.26Chapter 2.4.4 --- Frequency of unstable cases occurring in BI-architecture --- p.27Chapter 2.4.5 --- Comparison of performance of three strategies --- p.27Chapter 2.4.6 --- Limitations of BI-architecture --- p.28Chapter 2.5 --- Summary --- p.29Chapter 3 --- A Comparative Investigation on Model Selection in Binary Factor Analysis --- p.31Chapter 3.1 --- Binary Factor Analysis and ML Learning --- p.32Chapter 3.2 --- Hidden Factors Number Determination --- p.33Chapter 3.2.1 --- Using Typical Model Selection Criteria --- p.33Chapter 3.2.2 --- Using BYY harmony Learning --- p.34Chapter 3.3 --- Empirical Comparative Studies --- p.36Chapter 3.3.1 --- Effects of Sample Size --- p.37Chapter 3.3.2 --- Effects of Data Dimension --- p.37Chapter 3.3.3 --- Effects of Noise Variance --- p.39Chapter 3.3.4 --- Effects of hidden factor number --- p.43Chapter 3.3.5 --- Computing Costs --- p.43Chapter 3.4 --- Summary --- p.46Chapter 4 --- A Comparative Investigation on Model Selection in Non-gaussian Factor Analysis --- p.47Chapter 4.1 --- Non-Gaussian Factor Analysis and ML Learning --- p.48Chapter 4.2 --- Hidden Factor Determination --- p.51Chapter 4.2.1 --- Using typical model selection criteria --- p.51Chapter 4.2.2 --- BYY harmony Learning --- p.52Chapter 4.3 --- Empirical Comparative Studies --- p.55Chapter 4.3.1 --- Effects of Sample Size on Model Selection Criteria --- p.56Chapter 4.3.2 --- Effects of Data Dimension on Model Selection Criteria --- p.60Chapter 4.3.3 --- Effects of Noise Variance on Model Selection Criteria --- p.64Chapter 4.3.4 --- Discussion on Computational Cost --- p.64Chapter 4.4 --- Summary --- p.68Chapter 5 --- Conclusions --- p.69Bibliography --- p.7

    Machine Learning methods for long and short term energy demand forecasting

    Get PDF
    The thesis addresses the problems of long- and short- term electric load demand forecasting by using a mixed approach consisting of statistics and machine learning algorithms. The modelling of the multi-seasonal component of the Italian electric load is investigated by spectral analysis combined with machine learning. In particular, a frequency-domain version of the LASSO is developed in order to enforce sparsity in the parameter and efficiently obtain the main harmonics of the multi-seasonal term. The corresponding model yields one-year ahead forecasts whose Mean Absolute Percentage Error (MAPE) has the same order of magnitude of the one-day ahead predictor currently used by the Italian Transmission System Operator. Again for the Italian case, two whole-day ahead predictors are designed. The former applies to normal days while the latter is specifically designed for the Easter week. Concerning normal days, a predictor is built that relies exclusively on the loads recorded in the previous days, without resorting to exogenous data such as weather forecasts. This approach is viable in view of the highly correlated nature of the demand series, provided that suitable regularization-based strategies are applied in order to reduce the degrees of freedom and hence the parameters variance. The obtained forecasts improve significantly on the Terna benchmark predictor. The Easter week predictor is based on a Gaussian process model, whose kernel, differently from standard choices, is statistically designed from historical data. Again, even without using temperatures, a definite improvement is achieved over the Terna predictions. In the last chapter of the thesis, aggregation and enhancement techniques are introduced in order to suitably combine the prediction of different experts. The results, obtained on German national load data, show that, even in the case of missing experts, the proposed strategies yield to more accurate and robust predictions.The thesis addresses the problems of long- and short- term electric load demand forecasting by using a mixed approach consisting of statistics and machine learning algorithms. The modelling of the multi-seasonal component of the Italian electric load is investigated by spectral analysis combined with machine learning. In particular, a frequency-domain version of the LASSO is developed in order to enforce sparsity in the parameter and efficiently obtain the main harmonics of the multi-seasonal term. The corresponding model yields one-year ahead forecasts whose Mean Absolute Percentage Error (MAPE) has the same order of magnitude of the one-day ahead predictor currently used by the Italian Transmission System Operator. Again for the Italian case, two whole-day ahead predictors are designed. The former applies to normal days while the latter is specifically designed for the Easter week. Concerning normal days, a predictor is built that relies exclusively on the loads recorded in the previous days, without resorting to exogenous data such as weather forecasts. This approach is viable in view of the highly correlated nature of the demand series, provided that suitable regularization-based strategies are applied in order to reduce the degrees of freedom and hence the parameters variance. The obtained forecasts improve significantly on the Terna benchmark predictor. The Easter week predictor is based on a Gaussian process model, whose kernel, differently from standard choices, is statistically designed from historical data. Again, even without using temperatures, a definite improvement is achieved over the Terna predictions. In the last chapter of the thesis, aggregation and enhancement techniques are introduced in order to suitably combine the prediction of different experts. The results, obtained on German national load data, show that, even in the case of missing experts, the proposed strategies yield to more accurate and robust predictions

    Boosted Feature Generation for Classification Problems Involving High Numbers of Inputs and Classes

    Get PDF
    Classification problems involving high numbers of inputs and classes play an important role in the field of machine learning. Image classification, in particular, is a very active field of research with numerous applications. In addition to their high number, inputs of image classification problems often show significant correlation. Also, in proportion to the number of inputs, the number of available training samples is usually low. Therefore techniques combining low susceptibility to overfitting with good classification performance have to be found. Since for many tasks data has to be processed in real time, computational efficiency is crucial as well. Boosting is a machine learning technique, which is used successfully in a number of application areas, in particular in the field of machine vision. Due to it's modular design and flexibility, Boosting can be adapted to new problems easily. In addition, techniques for optimizing classifiers produced by Boosting with respect to computational efficiency exist. Boosting builds linear ensembles of base classifiers in a stage-wise fashion. Sample-weights reflect whether training samples are hard-to-classify or not. Therefore Boosting is able to adapt to the given classification problem over the course of training. The present work deals with the design of techniques for adapting Boosting to problems involving high numbers of inputs and classes. In the first part, application of Boosting to multi-class problems is analyzed. After giving an overview of existing approaches, a new formulation for base-classifiers solving multi-class problems by splitting them into pair-wise binary subproblems is presented. Experimental evaluation shows the good performance and computational efficiency of the proposed technique compared to state-of-the-art techniques. In the second part of the work, techniques that use Boosting for feature generation are presented. These techniques use the distribution of sample weights, produced by Boosting, to learn features that are adapted to the problems solved in each Boosting stage. By using smoothing-spline base classifiers, gradient descent schemes can be incorporated to find features that minimize the cost function of the current base classifier. Experimental evaluation shows, that Boosting with linear projective features significantly outperforms state-of-the-art approaches like e.g. SVM and Random Forests. In order to be applicable to image classification problems, the presented feature generation scheme is extended to produce shift-invariant features. The utilized features are inspired by the features used in Convolutional Neural Networks and perform a combination of convolution and subsampling. Experimental evaluation for classification of handwritten digits and car side-views shows that the proposed system is competitive to the best published results. The presented scheme has the advantages of being very simple and involving a low number of design parameters only

    Robust and Scalable Data Representation and Analysis Leveraging Isometric Transformations and Sparsity

    Get PDF
    The main focus of this doctoral thesis is to study the problem of robust and scalable data representation and analysis. The success of any machine learning and signal processing framework relies on how the data is represented and analyzed. Thus, in this work, we focus on three closely related problems: (i) supervised representation learning, (ii) unsupervised representation learning, and (iii) fault tolerant data analysis. For the first task, we put forward new theoretical results on why a certain family of neural networks can become extremely deep and how we can improve this scalability property in a mathematically sound manner. We further investigate how we can employ them to generate data representations that are robust to outliers and to retrieve representative subsets of huge datasets. For the second task, we will discuss two different methods, namely compressive sensing (CS) and nonnegative matrix factorization (NMF). We show that we can employ prior knowledge, such as slow variation in time, to introduce an unsupervised learning component to the traditional CS framework and to learn better compressed representations. Furthermore, we show that prior knowledge and sparsity constraint can be used in the context of NMF, not to find sparse hidden factors, but to enforce other structures, such as piece-wise continuity. Finally, for the third task, we investigate how a data analysis framework can become robust to faulty data and faulty data processors. We employ Bayesian inference and propose a scheme that can solve the CS recovery problem in an asynchronous parallel manner. Furthermore, we show how sparsity can be used to make an optimization problem robust to faulty data measurements. The methods investigated in this work have applications in different practical problems such as resource allocation in wireless networks, source localization, image/video classification, and search engines. A detailed discussion of these practical applications will be presented for each method

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Über die Selbstorganisation einer hierarchischen Gedächtnisstruktur für kompositionelle Objektrepräsentation im visuellen Kortex

    Get PDF
    At present, there is a huge lag between the artificial and the biological information processing systems in terms of their capability to learn. This lag could be certainly reduced by gaining more insight into the higher functions of the brain like learning and memory. For instance, primate visual cortex is thought to provide the long-term memory for the visual objects acquired by experience. The visual cortex handles effortlessly arbitrary complex objects by decomposing them rapidly into constituent components of much lower complexity along hierarchically organized visual pathways. How this processing architecture self-organizes into a memory domain that employs such compositional object representation by learning from experience remains to a large extent a riddle. The study presented here approaches this question by proposing a functional model of a self-organizing hierarchical memory network. The model is based on hypothetical neuronal mechanisms involved in cortical processing and adaptation. The network architecture comprises two consecutive layers of distributed, recurrently interconnected modules. Each module is identified with a localized cortical cluster of fine-scale excitatory subnetworks. A single module performs competitive unsupervised learning on the incoming afferent signals to form a suitable representation of the locally accessible input space. The network employs an operating scheme where ongoing processing is made of discrete successive fragments termed decision cycles, presumably identifiable with the fast gamma rhythms observed in the cortex. The cycles are synchronized across the distributed modules that produce highly sparse activity within each cycle by instantiating a local winner-take-all-like operation. Equipped with adaptive mechanisms of bidirectional synaptic plasticity and homeostatic activity regulation, the network is exposed to natural face images of different persons. The images are presented incrementally one per cycle to the lower network layer as a set of Gabor filter responses extracted from local facial landmarks. The images are presented without any person identity labels. In the course of unsupervised learning, the network creates simultaneously vocabularies of reusable local face appearance elements, captures relations between the elements by linking associatively those parts that encode the same face identity, develops the higher-order identity symbols for the memorized compositions and projects this information back onto the vocabularies in generative manner. This learning corresponds to the simultaneous formation of bottom-up, lateral and top-down synaptic connectivity within and between the network layers. In the mature connectivity state, the network holds thus full compositional description of the experienced faces in form of sparse memory traces that reside in the feed-forward and recurrent connectivity. Due to the generative nature of the established representation, the network is able to recreate the full compositional description of a memorized face in terms of all its constituent parts given only its higher-order identity symbol or a subset of its parts. In the test phase, the network successfully proves its ability to recognize identity and gender of the persons from alternative face views not shown before. An intriguing feature of the emerging memory network is its ability to self-generate activity spontaneously in absence of the external stimuli. In this sleep-like off-line mode, the network shows a self-sustaining replay of the memory content formed during the previous learning. Remarkably, the recognition performance is tremendously boosted after this off-line memory reprocessing. The performance boost is articulated stronger on those face views that deviate more from the original view shown during the learning. This indicates that the off-line memory reprocessing during the sleep-like state specifically improves the generalization capability of the memory network. The positive effect turns out to be surprisingly independent of synapse-specific plasticity, relying completely on the synapse-unspecific, homeostatic activity regulation across the memory network. The developed network demonstrates thus functionality not shown by any previous neuronal modeling approach. It forms and maintains a memory domain for compositional, generative object representation in unsupervised manner through experience with natural visual images, using both on- ("wake") and off-line ("sleep") learning regimes. This functionality offers a promising departure point for further studies, aiming for deeper insight into the learning mechanisms employed by the brain and their consequent implementation in the artificial adaptive systems for solving complex tasks not tractable so far.Gegenwärtig besteht immer noch ein enormer Abstand zwischen der Lernfähigkeit von künstlichen und biologischen Informationsverarbeitungssystemen. Dieser Abstand ließe sich durch eine bessere Einsicht in die höheren Funktionen des Gehirns wie Lernen und Gedächtnis verringern. Im visuellen Kortex etwa werden die Objekte innerhalb kürzester Zeit entlang der hierarchischen Verarbeitungspfade in ihre Bestandteile zerlegt und so durch eine Komposition von Elementen niedrigerer Komplexität dargestellt. Bereits bekannte Objekte werden so aus dem Langzeitgedächtnis abgerufen und wiedererkannt. Wie eine derartige kompositionell-hierarchische Gedächtnisstruktur durch die visuelle Erfahrung zustande kommen kann, ist noch weitgehend ungeklärt. Um dieser Frage nachzugehen, wird hier ein funktionelles Modell eines lernfähigen rekurrenten neuronalen Netzwerkes vorgestellt. Im Netzwerk werden neuronale Mechanismen implementiert, die der kortikalen Verarbeitung und Plastizität zugrunde liegen. Die hierarchische Architektur des Netzwerkes besteht aus zwei nacheinander geschalteten Schichten, die jede eine Anzahl von verteilten, rekurrent vernetzten Modulen beherbergen. Ein Modul umfasst dabei mehrere funktionell separate Subnetzwerke. Jedes solches Modul ist imstande, aus den eintreffenden Signalen eine geeignete Repräsentation für den lokalen Eingaberaum unüberwacht zu lernen. Die fortlaufende Verarbeitung im Netzwerk setzt sich zusammen aus diskreten Fragmenten, genannt Entscheidungszyklen, die man mit den schnellen kortikalen Rhythmen im gamma-Frequenzbereich in Verbindung setzen kann. Die Zyklen sind synchronisiert zwischen den verteilten Modulen. Innerhalb eines Zyklus wird eine lokal umgrenzte winner-take-all-ähnliche Operation in Modulen durchgeführt. Die Kompetitionsstärke wächst im Laufe des Zyklus an. Diese Operation aktiviert in Abhängigkeit von den Eingabesignalen eine sehr kleine Anzahl von Einheiten und verstärkt sie auf Kosten der anderen, um den dargebotenen Reiz in der Netzwerkaktivität abzubilden. Ausgestattet mit adaptiven Mechanismen der bidirektionalen synaptischen Plastizität und der homöostatischen Aktivitätsregulierung, erhält das Netzwerk natürliche Gesichtsbilder von verschiedenen Personen dargeboten. Die Bilder werden der unteren Netzwerkschicht, je ein Bild pro Zyklus, als Ansammlung von Gaborfilterantworten aus lokalen Gesichtslandmarken zugeführt, ohne Information über die Personenidentität zur Verfügung zu stellen. Im Laufe der unüberwachten Lernprozedur formt das Netzwerk die Verbindungsstruktur derart, dass die Gesichter aller dargebotenen Personen im Netzwerk in Form von dünn besiedelten Gedächtnisspuren abgelegt werden. Hierzu werden gleichzeitig vorwärtsgerichtete (bottom-up) und rekurrente (lateral, top-down) synaptische Verbindungen innerhalb und zwischen den Schichten gelernt. Im reifen Verbindungszustand werden infolge dieses Lernens die einzelnen Gesichter als Komposition ihrer Bestandteile auf generative Art gespeichert. Dank der generativen Art der gelernten Struktur reichen schon allein das höhere Identitätssymbol oder eine kleine Teilmenge von zugehörigen Gesichtselementen, um alle Bestandteile der gespeicherten Gesichter aus dem Gedächtnis abzurufen. In der Testphase kann das Netzwerk erfolgreich sowohl die Identität als auch das Geschlecht von Personen aus vorher nicht gezeigten Gesichtsansichten erkennen. Eine bemerkenswerte Eigenschaft der entstandenen Gedächtnisarchitektur ist ihre Fähigkeit, ohne Darbietung von externen Stimuli spontan Aktivitätsmuster zu generieren und die im Gedächtnis abgelegten Inhalte in diesem schlafähnlichen "off-line" Regime wiederzugeben. Interessanterweise ergibt sich aus der Schlafphase ein direkter Vorteil für die Gedächtnisfunktion. Dieser Vorteil macht sich durch eine drastisch verbesserte Erkennungsrate nach der Schlafphase bemerkbar, wenn das Netwerk mit den zuvor nicht dargebotenen Ansichten von den bereits bekannten Personen konfrontiert wird. Die Leistungsverbesserung nach der Schlafphase ist umso deutlicher, je stärker die Alternativansichten vom Original abweichen. Dieser positive Effekt ist zudem komplett unabhängig von der synapsenspezifischen Plastizität und kann allein durch die synapsenunspezifische, homöostatische Regulation der Aktivität im Netzwerk erklärt werden. Das entwickelte Netzwerk demonstriert so eine im Bereich der neuronalen Modellierung bisher nicht gezeigte Funktionalität. Es kann unüberwacht eine Gedächtnisdomäne für kompositionelle, generative Objektrepräsentation durch die Erfahrung mit natürlichen Bildern sowohl im reizgetriebenen, wachähnlichen Zustand als auch im reizabgekoppelten, schlafähnlichen Zustand formen und verwalten. Diese Funktionalität bietet einen vielversprechenden Ausgangspunkt für weitere Studien, die die neuronalen Lernmechanismen des Gehirns ins Visier nehmen und letztendlich deren konsequente Umsetzung in technischen, adaptiven Systemen anstreben

    Innovative Wireless Localization Techniques and Applications

    Get PDF
    Innovative methodologies for the wireless localization of users and related applications are addressed in this thesis. In last years, the widespread diffusion of pervasive wireless communication (e.g., Wi-Fi) and global localization services (e.g., GPS) has boosted the interest and the research on location information and services. Location-aware applications are becoming fundamental to a growing number of consumers (e.g., navigation, advertising, seamless user interaction with smart places), private and public institutions in the fields of energy efficiency, security, safety, fleet management, emergency response. In this context, the position of the user - where is often more valuable for deploying services of interest than the identity of the user itself - who. In detail, opportunistic approaches based on the analysis of electromagnetic field indicators (i.e., received signal strength and channel state information) for the presence detection, the localization, the tracking and the posture recognition of cooperative and non-cooperative (device-free) users in indoor environments are proposed and validated in real world test sites. The methodologies are designed to exploit existing wireless infrastructures and commodity devices without any hardware modification. In outdoor environments, global positioning technologies are already available in commodity devices and vehicles, the research and knowledge transfer activities are actually focused on the design and validation of algorithms and systems devoted to support decision makers and operators for increasing efficiency, operations security, and management of large fleets as well as localized sensed information in order to gain situation awareness. In this field, a decision support system for emergency response and Civil Defense assets management (i.e., personnel and vehicles equipped with TETRA mobile radio) is described in terms of architecture and results of two-years of experimental validation

    BNAIC 2008:Proceedings of BNAIC 2008, the twentieth Belgian-Dutch Artificial Intelligence Conference

    Get PDF

    Data-driven method for enhanced corrosion assessment of reinforced concrete structures

    Get PDF
    Corrosion is a major problem affecting the durability of reinforced concrete structures. Corrosion related maintenance and repair of reinforced concrete structures cost multibillion USD per annum globally. It is often triggered by the ingression of carbon dioxide and/or chloride into the pores of concrete. Estimation of these corrosion causing factors using the conventional models results in suboptimal assessment since they are incapable of capturing the complex interaction of parameters. Hygrothermal interaction also plays a role in aggravating the corrosion of reinforcement bar and this is usually counteracted by applying surface protection systems. These systems have different degree of protection and they may even cause deterioration to the structure unintentionally. The overall objective of this dissertation is to provide a framework that enhances the assessment reliability of the corrosion controlling factors. The framework is realized through the development of data-driven carbonation depth, chloride profile and hygrothermal performance prediction models. The carbonation depth prediction model integrates neural network, decision tree, boosted and bagged ensemble decision trees. The ensemble tree based chloride profile prediction models evaluate the significance of chloride ingress controlling variables from various perspectives. The hygrothermal interaction prediction models are developed using neural networks to evaluate the status of corrosion and other unexpected deteriorations in surface-treated concrete elements. Long-term data for all models were obtained from three different field experiments. The performance comparison of the developed carbonation depth prediction model with the conventional one confirmed the prediction superiority of the data-driven model. The variable importance measure revealed that plasticizers and air contents are among the top six carbonation governing parameters out of 25. The discovered topmost chloride penetration controlling parameters representing the composition of the concrete are aggregate size distribution, amount and type of plasticizers and supplementary cementitious materials. The performance analysis of the developed hygrothermal model revealed its prediction capability with low error. The integrated exploratory data analysis technique with the hygrothermal model had identified the surfaceprotection systems that are able to protect from corrosion, chemical and frost attacks. All the developed corrosion assessment models are valid, reliable, robust and easily reproducible, which assist to define proactive maintenance plan. In addition, the determined influential parameters could help companies to produce optimized concrete mix that is able to resist carbonation and chloride penetration. Hence, the outcomes of this dissertation enable reduction of lifecycle costs
    • …
    corecore