18 research outputs found

    Tree Echo State Networks

    Get PDF
    In this paper we present the Tree Echo State Network (TreeESN) model, generalizing the paradigm of Reservoir Computing to tree structured data. TreeESNs exploit an untrained generalized recursive reservoir, exhibiting extreme efficiency for learning in structured domains. In addition, we highlight through the paper other characteristics of the approach: First, we discuss the Markovian characterization of reservoir dynamics, extended to the case of tree domains, that is implied by the contractive setting of the TreeESN state transition function. Second, we study two types of state mapping functions to map the tree structured state of TreeESN into a fixed-size feature representation for classification or regression tasks. The critical role of the relation between the choice of the state mapping function and the Markovian characterization of the task is analyzed and experimentally investigated on both artificial and real-world tasks. Finally, experimental results on benchmark and real-world tasks show that the TreeESN approach, in spite of its efficiency, can achieve comparable results with state-of-the-art, although more complex, neural and kernel based models for tree structured data

    On Filter Size in Graph Convolutional Networks

    Full text link
    Recently, many researchers have been focusing on the definition of neural networks for graphs. The basic component for many of these approaches remains the graph convolution idea proposed almost a decade ago. In this paper, we extend this basic component, following an intuition derived from the well-known convolutional filters over multi-dimensional tensors. In particular, we derive a simple, efficient and effective way to introduce a hyper-parameter on graph convolutions that influences the filter size, i.e. its receptive field over the considered graph. We show with experimental results on real-world graph datasets that the proposed graph convolutional filter improves the predictive performance of Deep Graph Convolutional Networks.Comment: arXiv admin note: text overlap with arXiv:1811.0693

    Compositional generative mapping for tree-structured data - Part II: Topographic projection model

    Get PDF
    We introduce GTM-SD (Generative Topographic Mapping for Structured Data), which is the first compositional generative model for topographic mapping of tree-structured data. GTM-SD exploits a scalable bottom-up hidden-tree Markov model that was introduced in Part I of this paper to achieve a recursive topographic mapping of hierarchical information. The proposed model allows efficient exploitation of contextual information from shared substructures by a recursive upward propagation on the tree structure which distributes substructure information across the topographic map. Compared to its noncompositional generative counterpart, GTM-SD is shown to allow the topographic mapping of the full sample tree, which includes a projection onto the lattice of all the distinct subtrees rooted in each of its nodes. Experimental results show that the continuous projection space generated by the smooth topographic mapping of GTM-SD yields a finer grained discrimination of the sample structures with respect to the state-of-the-art recursive neural network approach

    Kernel Methods for Tree Structured Data

    Get PDF
    Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements

    Semisupervised Kernel Matrix Learning by Kernel Propagation

    Full text link

    Efficient Learning Machines

    Get PDF
    Computer scienc

    Learning nonsparse kernels by self-organizing maps for structured data

    Get PDF
    The development of neural network (NN) models able to encode structured input, and the more recent definition of kernels for structures, makes it possible to directly apply machine learning approaches to generic structured data. However, the effectiveness of a kernel can depend on its sparsity with respect to a specific data set. In fact, the accuracy of a kernel method typically reduces as the kernel sparsity increases. The sparsity problem is particularly common in structured domains involving discrete variables which may take on many different values. In this paper, we explore this issue on two well-known kernels for trees, and propose to face it by recurring to self-organizing maps (SOMs) for structures. Specifically, we show that a suitable combination of the two approaches, obtained by defining a new class of kernels based on the activation map of a SOM for structures, can be effective in avoiding the sparsity problem and results in a system that can be significantly more accurate for categorization tasks on structured data. The effectiveness of the proposed approach is demonstrated experimentally on two relatively large corpora of XML formatted data and a data set of user sessions extracted from Website logs

    Linear Models and Deep Learning: Learning in Sequential Domains

    Get PDF
    With the diffusion of cheap sensors, sensor-equipped devices (e.g., drones), and sensor networks (such as Internet of Things), as well as the development of inexpensive human-machine interaction interfaces, the ability to quickly and effectively process sequential data is becoming more and more important. There are many tasks that may benefit from advancement in this field, ranging from monitoring and classification of human behavior to prediction of future events. Most of the above tasks require pattern recognition and machine learning capabilities. There are many approaches that have been proposed in the past to learn in sequential domains, especially extensions in the field of Deep Learning. Deep Learning is based on highly nonlinear systems, which very often reach quite good classification/prediction performances, but at the expenses of a substantial computational burden. Actually, when facing learning in a sequential, or more in general structured domain, it is common practice to readily resort to nonlinear systems. Not always, however, the task really requires a nonlinear system. So the risk is to run into difficult and computational expensive training procedures to eventually get a solution that improves of an epsilon (if not at all) the performances that can be reached by a simple linear dynamical system involving simpler training procedures and a much lower computational effort. The aim of this thesis is to discuss about the role that linear dynamical systems may have in learning in sequential domains. On one hand, we like to point out that a linear dynamical system (LDS) is able, in many cases, to already provide good performances at a relatively low computational cost. On the other hand, when a linear dynamical system is not enough to provide a reasonable solution, we show that it can be used as a building block to construct more complex and powerful models, or how to resort to it to design quite effective pre-training techniques for nonlinear dynamical systems, such as Echo State Networks (ESNs) and simple Recurrent Neural Networks (RNNs). Specifically, in this thesis we consider the task of predicting the next event into a sequence of events. The datasets used to test various discussed models involve polyphonic music and contain quite long sequences. We start by introducing a simple state space LDS. Three different approaches to train the LDS are then considered. Then we introduce some brand new models that are inspired by the LDS and that have the aim to increase the prediction/classification capabilities of the simple linear models. We then move to study the most common nonlinear models. From this point of view, we considered the RNN models, which are significantly more computationally demanding. We experimentally show that, at least for the addressed prediction task and the considered datasets, the introduction of pre-training approaches involving linear systems leads to quite large improvements in prediction performances. Specifically, we introduce pre-training via linear Autoencoder, and an alternative based on Hidden Markov Models (HMMs). Experimental results suggest that linear models may play an important role for learning in sequential domains, both when used directly or indirectly (as basis for pre-training approaches): in fact, when used directly, linear models may by themselves return state-of-the-art performance, while requiring a much lower computational effort with respect to their nonlinear counterpart. Moreover, even when linear models do not perform well, it is always possible to successfully exploit them within pre-training approaches for nonlinear systems
    corecore