3,788 research outputs found

    Learning from the Kernel and the Range Space

    Full text link
    In this article, a novel approach to learning a complex function which can be written as the system of linear equations is introduced. This learning is grounded upon the observation that solving the system of linear equations by a manipulation in the kernel and the range space boils down to an estimation based on the least squares error approximation. The learning approach is applied to learn a deep feedforward network with full weight connections. The numerical experiments on network learning of synthetic and benchmark data not only show feasibility of the proposed learning approach but also provide insights into the mechanism of data representation.Comment: Camera-ready finalized on 22 April 2018, paper presented on 07 June 2018 in the 17th IEEE/ACIS International Conference on Computer and Information Science (ICIS) 201

    Constrained Extreme Learning Machines: A Study on Classification Cases

    Full text link
    Extreme learning machine (ELM) is an extremely fast learning method and has a powerful performance for pattern recognition tasks proven by enormous researches and engineers. However, its good generalization ability is built on large numbers of hidden neurons, which is not beneficial to real time response in the test process. In this paper, we proposed new ways, named "constrained extreme learning machines" (CELMs), to randomly select hidden neurons based on sample distribution. Compared to completely random selection of hidden nodes in ELM, the CELMs randomly select hidden nodes from the constrained vector space containing some basic combinations of original sample vectors. The experimental results show that the CELMs have better generalization ability than traditional ELM, SVM and some other related methods. Additionally, the CELMs have a similar fast learning speed as ELM.Comment: 14 pages, 6 figure, journe

    Understanding Autoencoders with Information Theoretic Concepts

    Full text link
    Despite their great success in practical applications, there is still a lack of theoretical and systematic methods to analyze deep neural networks. In this paper, we illustrate an advanced information theoretic methodology to understand the dynamics of learning and the design of autoencoders, a special type of deep learning architectures that resembles a communication channel. By generalizing the information plane to any cost function, and inspecting the roles and dynamics of different layers using layer-wise information quantities, we emphasize the role that mutual information plays in quantifying learning from data. We further suggest and also experimentally validate, for mean square error training, three fundamental properties regarding the layer-wise flow of information and intrinsic dimensionality of the bottleneck layer, using respectively the data processing inequality and the identification of a bifurcation point in the information plane that is controlled by the given data. Our observations have a direct impact on the optimal design of autoencoders, the design of alternative feedforward training methods, and even in the problem of generalization.Comment: Paper accepted by Neural Networks. Code for estimating information quantities and drawing the information plane is available from https://drive.google.com/drive/folders/1e5sIywZfmWp4Dn0WEesb6fqQRM0DIGxZ?usp=sharin

    A Survey on Multi-Task Learning

    Full text link
    Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach, and decomposition approach, and then discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing are reviewed to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works. Finally, we present theoretical analyses and discuss several future directions for MTL

    Comparison of Deep Neural Networks and Deep Hierarchical Models for Spatio-Temporal Data

    Full text link
    Spatio-temporal data are ubiquitous in the agricultural, ecological, and environmental sciences, and their study is important for understanding and predicting a wide variety of processes. One of the difficulties with modeling spatial processes that change in time is the complexity of the dependence structures that must describe how such a process varies, and the presence of high-dimensional complex data sets and large prediction domains. It is particularly challenging to specify parameterizations for nonlinear dynamic spatio-temporal models (DSTMs) that are simultaneously useful scientifically and efficient computationally. Statisticians have developed deep hierarchical models that can accommodate process complexity as well as the uncertainties in the predictions and inference. However, these models can be expensive and are typically application specific. On the other hand, the machine learning community has developed alternative "deep learning" approaches for nonlinear spatio-temporal modeling. These models are flexible yet are typically not implemented in a probabilistic framework. The two paradigms have many things in common and suggest hybrid approaches that can benefit from elements of each framework. This overview paper presents a brief introduction to the deep hierarchical DSTM (DH-DSTM) framework, and deep models in machine learning, culminating with the deep neural DSTM (DN-DSTM). Recent approaches that combine elements from DH-DSTMs and echo state network DN-DSTMs are presented as illustrations.Comment: 26 pages, including 6 figures and reference

    A New Learning Paradigm for Random Vector Functional-Link Network: RVFL+

    Full text link
    In school, a teacher plays an important role in various classroom teaching patterns. Likewise to this human learning activity, the learning using privileged information (LUPI) paradigm provides additional information generated by the teacher to 'teach' learning models during the training stage. Therefore, this novel learning paradigm is a typical Teacher-Student Interaction mechanism. This paper is the first to present a random vector functional link network based on the LUPI paradigm, called RVFL+. Rather than simply combining two existing approaches, the newly-derived RVFL+ fills the gap between classical randomized neural networks and the newfashioned LUPI paradigm, which offers an alternative way to train RVFL networks. Moreover, the proposed RVFL+ can perform in conjunction with the kernel trick for highly complicated nonlinear feature learning, which is termed KRVFL+. Furthermore, the statistical property of the proposed RVFL+ is investigated, and we present a sharp and high-quality generalization error bound based on the Rademacher complexity. Competitive experimental results on 14 real-world datasets illustrate the great effectiveness and efficiency of the novel RVFL+ and KRVFL+, which can achieve better generalization performance than state-of-the-art methods.Comment: We have updated the previous wor

    PointHop: An Explainable Machine Learning Method for Point Cloud Classification

    Full text link
    An explainable machine learning method for point cloud classification, called the PointHop method, is proposed in this work. The PointHop method consists of two stages: 1) local-to-global attribute building through iterative one-hop information exchange, and 2) classification and ensembles. In the attribute building stage, we address the problem of unordered point cloud data using a space partitioning procedure and developing a robust descriptor that characterizes the relationship between a point and its one-hop neighbor in a PointHop unit. When we put multiple PointHop units in cascade, the attributes of a point will grow by taking its relationship with one-hop neighbor points into account iteratively. Furthermore, to control the rapid dimension growth of the attribute vector associated with a point, we use the Saab transform to reduce the attribute dimension in each PointHop unit. In the classification and ensemble stage, we feed the feature vector obtained from multiple PointHop units to a classifier. We explore ensemble methods to improve the classification performance furthermore. It is shown by experimental results that the PointHop method offers classification performance that is comparable with state-of-the-art methods while demanding much lower training complexity.Comment: 13 pages with 9 figure

    Deep Learning with the Random Neural Network and its Applications

    Full text link
    The random neural network (RNN) is a mathematical model for an "integrate and fire" spiking network that closely resembles the stochastic behaviour of neurons in mammalian brains. Since its proposal in 1989, there have been numerous investigations into the RNN's applications and learning algorithms. Deep learning (DL) has achieved great success in machine learning. Recently, the properties of the RNN for DL have been investigated, in order to combine their power. Recent results demonstrate that the gap between RNNs and DL can be bridged and the DL tools based on the RNN are faster and can potentially be used with less energy expenditure than existing methods.Comment: 23 pages, 19 figure

    Machine Learning for Wireless Communications in the Internet of Things: A Comprehensive Survey

    Full text link
    The Internet of Things (IoT) is expected to require more effective and efficient wireless communications than ever before. For this reason, techniques such as spectrum sharing, dynamic spectrum access, extraction of signal intelligence and optimized routing will soon become essential components of the IoT wireless communication paradigm. Given that the majority of the IoT will be composed of tiny, mobile, and energy-constrained devices, traditional techniques based on a priori network optimization may not be suitable, since (i) an accurate model of the environment may not be readily available in practical scenarios; (ii) the computational requirements of traditional optimization techniques may prove unbearable for IoT devices. To address the above challenges, much research has been devoted to exploring the use of machine learning to address problems in the IoT wireless communications domain. This work provides a comprehensive survey of the state of the art in the application of machine learning techniques to address key problems in IoT wireless communications with an emphasis on its ad hoc networking aspect. First, we present extensive background notions of machine learning techniques. Then, by adopting a bottom-up approach, we examine existing work on machine learning for the IoT at the physical, data-link and network layer of the protocol stack. Thereafter, we discuss directions taken by the community towards hardware implementation to ensure the feasibility of these techniques. Additionally, before concluding, we also provide a brief discussion of the application of machine learning in IoT beyond wireless communication. Finally, each of these discussions is accompanied by a detailed analysis of the related open problems and challenges.Comment: Ad Hoc Networks Journa

    SSFN -- Self Size-estimating Feed-forward Network with Low Complexity, Limited Need for Human Intervention, and Consistent Behaviour across Trials

    Full text link
    We design a self size-estimating feed-forward network (SSFN) using a joint optimization approach for estimation of number of layers, number of nodes and learning of weight matrices. The learning algorithm has a low computational complexity, preferably within few minutes using a laptop. In addition the algorithm has a limited need for human intervention to tune parameters. SSFN grows from a small-size network to a large-size network, guaranteeing a monotonically non-increasing cost with addition of nodes and layers. The learning approach uses judicious a combination of `lossless flow property' of some activation functions, convex optimization and instance of random matrix. Consistent performance -- low variation across Monte-Carlo trials -- is found for inference performance (classification accuracy) and estimation of network size
    • …
    corecore