112 research outputs found

    On the application of reservoir computing networks for noisy image recognition

    Get PDF
    Reservoir Computing Networks (RCNs) are a special type of single layer recurrent neural networks, in which the input and the recurrent connections are randomly generated and only the output weights are trained. Besides the ability to process temporal information, the key points of RCN are easy training and robustness against noise. Recently, we introduced a simple strategy to tune the parameters of RCNs. Evaluation in the domain of noise robust speech recognition proved that this method was effective. The aim of this work is to extend that study to the field of image processing, by showing that the proposed parameter tuning procedure is equally valid in the field of image processing and conforming that RCNs are apt at temporal modeling and are robust with respect to noise. In particular, we investigate the potential of RCNs in achieving competitive performance on the well-known MNIST dataset by following the aforementioned parameter optimizing strategy. Moreover, we achieve good noise robust recognition by utilizing such a network to denoise images and supplying them to a recognizer that is solely trained on clean images. The experiments demonstrate that the proposed RCN-based handwritten digit recognizer achieves an error rate of 0.81 percent on the clean test data of the MNIST benchmark and that the proposed RCN-based denoiser can effectively reduce the error rate on the various types of noise. (c) 2017 Elsevier B.V. All rights reserved

    Representation Learning: A Review and New Perspectives

    Full text link
    The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

    Foundations and Advances in Deep Learning

    Get PDF
    Deep neural networks have become increasingly popular under the name of deep learning recently due to their success in challenging machine learning tasks. Although the popularity is mainly due to recent successes, the history of neural networks goes as far back as 1958 when Rosenblatt presented a perceptron learning algorithm. Since then, various kinds of artificial neural networks have been proposed. They include Hopfield networks, self-organizing maps, neural principal component analysis, Boltzmann machines, multi-layer perceptrons, radial-basis function networks, autoencoders, sigmoid belief networks, support vector machines and deep belief networks. The first part of this thesis investigates shallow and deep neural networks in search of principles that explain why deep neural networks work so well across a range of applications. The thesis starts from some of the earlier ideas and models in the field of artificial neural networks and arrive at autoencoders and Boltzmann machines which are two most widely studied neural networks these days. The author thoroughly discusses how those various neural networks are related to each other and how the principles behind those networks form a foundation for autoencoders and Boltzmann machines. The second part is the collection of the ten recent publications by the author. These publications mainly focus on learning and inference algorithms of Boltzmann machines and autoencoders. Especially, Boltzmann machines, which are known to be difficult to train, have been in the main focus. Throughout several publications the author and the co-authors have devised and proposed a new set of learning algorithms which includes the enhanced gradient, adaptive learning rate and parallel tempering. These algorithms are further applied to a restricted Boltzmann machine with Gaussian visible units. In addition to these algorithms for restricted Boltzmann machines the author proposed a two-stage pretraining algorithm that initializes the parameters of a deep Boltzmann machine to match the variational posterior distribution of a similarly structured deep autoencoder. Finally, deep neural networks are applied to image denoising and speech recognition

    Compensação digital de distorções da fibra em sistemas de comunicação óticos de longa distância

    Get PDF
    The continuous increase of traffic demand in long-haul communications motivated the network operators to look for receiver side techniques to mitigate the nonlinear effects, resulting from signal-signal and signal-noise interaction, thus pushing the current Capacity boundaries. Machine learning techniques are a very hot-topic with given proofs in the most diverse applications. This dissertation aims to study nonlinear impairments in long-haul coherent optical links and the current state of the art in DSP techniques for impairment mitigation as well as the integration of machine learning strategies in optical networks. Starting with a simplified fiber model only impaired by ASE noise, we studied how to integrate an ANN-based symbol estimator into the signal pipeline, enabling to validate the implementation by matching the theoretical performance. We then moved to nonlinear proof of concept with the incorporation of NLPN in the fiber link. Finally, we evaluated the performance of the estimator under realistic simulations of Single and Multi- Channel links in both SSFM and NZDSF fibers. The obtained results indicate that even though it may be hard to find the best architecture, Nonlinear Symbol Estimator networks have the potential to surpass more conventional DSP strategies.O aumento contínuo de tráfego nas comunicações de longo-alcance motivou os operadores de rede a procurar técnicas do lado do receptor para atenuar os efeitos não lineares resultantes da interacção sinal-sinal e sinal-ruído, alargando assim os limites da capacidade do sistema. As técnicas de aprendizagem-máquina são um tópico em ascenção com provas dadas nas mais diversas aplicações e setores. Esta dissertação visa estudar as principais deficiências nas ligações de longo curso e o actual estado da arte em técnicas de DSP para mitigação das mesmas, bem como a integração de estratégias de aprendizagem-máquina em redes ópticas. Começando com um modelo simplificado de fibra apenas perturbado pelo ruído ASE, estudámos como integrar um estimador de símbolos baseado em ANN na cadeia do prodessamento de sinal, conseguindo igualar o desempenho teórico. Procedemos com uma prova de conceito perante não linearidades com a incorporação do ruído de fase não linear na propagação. Finalmente, avaliamos o desempenho do estimador com simulações realistas de links Single e Multi canal tanto em fibras SSFM como NZDSF. Os resultados obtidos indicam que apesar da dificuldade de encontrar a melhor arquitectura, a estimação não linear baseada em redes neuronais têm o potencial para ultrapassar estratégias DSP mais convencionais.Mestrado em Engenharia Eletrónica e Telecomunicaçõe

    Robust Multimodal Representation Learning with Evolutionary Adversarial Attention Networks

    Get PDF
    Multimodal representation learning is beneficial for many multimedia-oriented applications such as social image recognition and visual question answering. The different modalities of the same instance (e.g., a social image and its corresponding description) are usually correlational and complementary. Most existing approaches for multimodal representation learning are not effective to model the deep correlation between different modalities. Moreover, it is difficult for these approaches to deal with the noise within social images. In this paper, we propose a deep learning-based approach named Evolutionary Adversarial Attention Networks (EAAN), which combines the attention mechanism with adversarial networks through evolutionary training, for robust multimodal representation learning. Specifically, a two-branch visual-textual attention model is proposed to correlate visual and textual content for joint representation. Then adversarial networks are employed to impose regularization upon the representation by matching its posterior distribution to the given priors. Finally, the attention model and adversarial networks are integrated into an evolutionary training framework for robust multimodal representation learning. Extensive experiments have been conducted on four real-world datasets, including PASCAL, MIR, CLEF, and NUS-WIDE. Substantial performance improvements on the tasks of image classification and tag recommendation demonstrate the superiority of the proposed approach

    Machine Learning in Sensors and Imaging

    Get PDF
    Machine learning is extending its applications in various fields, such as image processing, the Internet of Things, user interface, big data, manufacturing, management, etc. As data are required to build machine learning networks, sensors are one of the most important technologies. In addition, machine learning networks can contribute to the improvement in sensor performance and the creation of new sensor applications. This Special Issue addresses all types of machine learning applications related to sensors and imaging. It covers computer vision-based control, activity recognition, fuzzy label classification, failure classification, motor temperature estimation, the camera calibration of intelligent vehicles, error detection, color prior model, compressive sensing, wildfire risk assessment, shelf auditing, forest-growing stem volume estimation, road management, image denoising, and touchscreens

    Learning Object Recognition and Object Class Segmentation with Deep Neural Networks on GPU

    Get PDF
    As cameras are becoming ubiquitous and internet storage abundant, the need for computers to understand images is growing rapidly. This thesis is concerned with two computer vision tasks, recognizing objects and their location, and segmenting images according to object classes. We focus on deep learning approaches, which in recent years had a tremendous influence on machine learning in general and computer vision in particular. The thesis presents our research into deep learning models and algorithms. It is divided into three parts. The first part describes our GPU deep learning framework. Its hierarchical structure allows transparent use of GPU, facilitates specification of complex models, model inspection, and constitutes the implementation basis of the later chapters. Components of this framework were used in a real-time GPU library for random forests, which we present and evaluate. In the second part, we investigate greedy learning techniques for semi-supervised object recognition. We improve the feature learning capabilities of restricted Boltzmann machines (RBM) with lateral interactions and auto-encoders with additional hidden layers, and offer empirical insight into the evaluation of RBM learning algorithms. The third part of this thesis focuses on object class segmentation. Here, we incrementally introduce novel neural network models and training algorithms, successively improving the state of the art on multiple datasets. Our novel methods include supervised pre-training, histogram of oriented gradient DNN inputs, depth normalization and recurrence. All contribute towards improving segmentation performance beyond what is possible with competitive baseline methods. We further demonstrate that pixelwise labeling combined with a structured loss function can be utilized to localize objects. Finally, we show how transfer learning in combination with object-centered depth colorization can be used to identify objects. We evaluate our proposed methods on the publicly available MNIST, MSRC, INRIA Graz-02, NYU-Depth, Pascal VOC, and Washington RGB-D Objects datasets.Allgegenwärtige Kameras und preiswerter Internetspeicher erzeugen einen großen Bedarf an Algorithmen für maschinelles Sehen. Die vorliegende Dissertation adressiert zwei Teilbereiche dieses Forschungsfeldes: Erkennung von Objekten und Objektklassensegmentierung. Der methodische Schwerpunkt liegt auf dem Lernen von tiefen Modellen (”Deep Learning“). Diese haben in den vergangenen Jahren einen enormen Einfluss auf maschinelles Lernen allgemein und speziell maschinelles Sehen gewonnen. Dabei behandeln wir behandeln wir drei Themenfelder. Der erste Teil der Arbeit beschreibt ein GPU-basiertes Softwaresystem für Deep Learning. Dessen hierarchische Struktur erlaubt schnelle GPU-Berechnungen, einfache Spezifikation komplexer Modelle und interaktive Modellanalyse. Damit liefert es das Fundament für die folgenden Kapitel. Teile des Systems finden Verwendung in einer Echtzeit-GPU-Bibliothek für Random Forests, die wir ebenfalls vorstellen und evaluieren. Der zweite Teil der Arbeit beleuchtet Greedy-Lernalgorithmen für halb überwachtes Lernen. Hier werden hierarchische Modelle schrittweise aus Modulen wie Autokodierern oder restricted Boltzmann Machines (RBM ) aufgebaut. Wir verbessern die Repräsentationsfähigkeiten von RBM auf Bildern durch Einführung lokaler und lateraler Verknüpfungen und liefern empirische Erkenntnisse zur Bewertung von RBM-Lernalgorithmen. Wir zeigen zudem, dass die in Autokodierern verwendeten einschichtigen Kodierer komplexe Zusammenhänge ihrer Eingaben nicht erkennen können und schlagen stattdessen einen hybriden Kodierer vor, der sowohl komplexe Zusammenhänge erkennen, als auch weiterhin einfache Zusammenhänge einfach repräsentieren kann. Im dritten Teil der Arbeit stellen wir neue neuronale Netzarchitekturen und Trainingsmethoden für die Objektklassensegmentierung vor. Wir zeigen, dass neuronale Netze mit überwachtem Vortrainieren, wiederverwendeten Ausgaben und Histogrammen Orientierter Gradienten (HOG) als Eingabe den aktuellen Stand der Technik auf mehreren RGB-Datenmengen erreichen können. Anschließend erweitern wir unsere Methoden in zwei Dimensionen, sodass sie mit Tiefendaten (RGB-D) und Videos verarbeiten können. Dazu führen wir zunächst Tiefennormalisierung für Objektklassensegmentierung ein um die Skala zu fixieren, und erlauben expliziten Zugriff auf die Höhe in einem Bildausschnitt. Schließlich stellen wir ein rekurrentes konvolutionales neuronales Netz vor, das einen großen räumlichen Kontext einbezieht, hochaufgelöste Ausgaben produziert und Videosequenzen verarbeiten kann. Dadurch verbessert sich die Bildsegmentierung relativ zu vergleichbaren Methoden, etwa auf der Basis von Random Forests oder CRF . Wir zeigen dann, dass pixelbasierte Ausgaben in neuronalen Netzen auch benutzt werden können um die Position von Objekten zu detektieren. Dazu kombinieren wir Techniken des strukturierten Lernens mit Konvolutionsnetzen. Schließlich schlagen wir eine objektzentrierte Einfärbungsmethode vor, die es ermöglicht auf RGB-Bildern trainierte neuronale Netze auf RGB-D-Bildern einzusetzen. Dieser Transferlernansatz erlaubt es uns auch mit stark reduzierten Trainingsmengen noch bessere Ergebnisse beim Schätzen von Objektklassen, -instanzen und -orientierungen zu erzielen. Wir werten die von uns vorgeschlagenen Methoden auf den öffentlich zugänglichen MNIST, MSRC, INRIA Graz-02, NYU-Depth, Pascal VOC, und Washington RGB-D Objects Datenmengen aus

    WLAN-paikannuksen elinkaaren tukeminen

    Get PDF
    The advent of GPS positioning at the turn of the millennium provided consumers with worldwide access to outdoor location information. For the purposes of indoor positioning, however, the GPS signal rarely penetrates buildings well enough to maintain the same level of positioning granularity as outdoors. Arriving around the same time, wireless local area networks (WLAN) have gained widespread support both in terms of infrastructure deployments and client proliferation. A promising approach to bridge the location context then has been positioning based on WLAN signals. In addition to being readily available in most environments needing support for location information, the adoption of a WLAN positioning system is financially low-cost compared to dedicated infrastructure approaches, partly due to operating on an unlicensed frequency band. Furthermore, the accuracy provided by this approach is enough for a wide range of location-based services, such as navigation and location-aware advertisements. In spite of this attractive proposition and extensive research in both academia and industry, WLAN positioning has yet to become the de facto choice for indoor positioning. This is despite over 20 000 publications and the foundation of several companies. The main reasons for this include: (i) the cost of deployment, and re-deployment, which is often significant, if not prohibitive, in terms of work hours; (ii) the complex propagation of the wireless signal, which -- through interaction with the environment -- renders it inherently stochastic; (iii) the use of an unlicensed frequency band, which means the wireless medium faces fierce competition by other technologies, and even unintentional radiators, that can impair traffic in unforeseen ways and impact positioning accuracy. This thesis addresses these issues by developing novel solutions for reducing the effort of deployment, including optimizing the indoor location topology for the use of WLAN positioning, as well as automatically detecting sources of cross-technology interference. These contributions pave the way for WLAN positioning to become as ubiquitous as the underlying technology.GPS-paikannus avattiin julkiseen käyttöön vuosituhannen vaihteessa, jonka jälkeen sitä on voinut käyttää sijainnin paikantamiseen ulkotiloissa kaikkialla maailmassa. Sisätiloissa GPS-signaali kuitenkin harvoin läpäisee rakennuksia kyllin hyvin voidakseen tarjota vastaavaa paikannustarkkuutta. Langattomat lähiverkot (WLAN), mukaan lukien tukiasemat ja käyttölaitteet, yleistyivät nopeasti samoihin aikoihin. Näiden verkkojen signaalien käyttö on siksi alusta asti tarjonnut lupaavia mahdollisuuksia sisätilapaikannukseen. Useimmissa ympäristöissä on jo valmiit WLAN-verkot, joten paikannuksen käyttöönotto on edullista verrattuna järjestelmiin, jotka vaativat erillisen laitteiston. Tämä johtuu osittain lisenssivapaasta taajuusalueesta, joka mahdollistaa kohtuuhintaiset päätelaitteet. WLAN-paikannuksen tarjoama tarkkuus on lisäksi riittävä monille sijaintipohjaisille palveluille, kuten suunnistamiselle ja paikkatietoisille mainoksille. Näistä lupaavista alkuasetelmista ja laajasta tutkimuksesta huolimatta WLAN-paikannus ei ole kuitenkaan pystynyt lunastamaan paikkaansa pääasiallisena sisätilapaikannusmenetelmänä. Vaivannäöstä ei ole puutetta; vuosien saatossa on julkaistu yli 20 000 tieteellistä artikkelia sekä perustettu useita yrityksiä. Syitä tähän kehitykseen on useita. Ensinnäkin, paikannuksen pystyttäminen ja ylläpito vaativat aikaa ja vaivaa. Toiseksi, langattoman signaalin eteneminen ja vuorovaikutus ympäristön kanssa on hyvin monimutkaista, mikä tekee mallintamisesta vaikeaa. Kolmanneksi, eri teknologiat ja laitteet kilpailevat lisenssivapaan taajuusalueen käytöstä, mikä johtaa satunnaisiin paikannustarkkuuteen vaikuttaviin tietoliikennehäiriöihin. Väitöskirja esittelee uusia menetelmiä joilla voidaan merkittävästi pienentää paikannusjärjestelmän asennuskustannuksia, jakaa ympäristö automaattisesti osiin WLAN-paikannusta varten, sekä tunnistaa mahdolliset langattomat häiriölähteet. Nämä kehitysaskeleet edesauttavat WLAN-paikannuksen yleistymistä jokapäiväiseen käyttöön
    corecore