16 research outputs found

    Embedding and learning with signatures

    Get PDF
    Sequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. The present article is concerned with a novel approach for sequential learning, called the signature method, and rooted in rough path theory. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. This approach relies critically on an embedding principle, which consists in representing discretely sampled data as paths, i.e., functions from [0,1] to R^d. After a survey of machine learning methodologies for signatures, we investigate the influence of embeddings on prediction accuracy with an in-depth study of three recent and challenging datasets. We show that a specific embedding, called lead-lag, is systematically better, whatever the dataset or algorithm used. Moreover, we emphasize through an empirical study that computing signatures over the whole path domain does not lead to a loss of local information. We conclude that, with a good embedding, the signature combined with a simple algorithm achieves results competitive with state-of-the-art, domain-specific approaches

    The insertion method to invert the signature of a path

    Full text link
    The signature is a representation of a path as an infinite sequence of its iterated integrals. Under certain assumptions, the signature characterizes the path, up to translation and reparameterization. Therefore, a crucial question of interest is the development of efficient algorithms to invert the signature, i.e., to reconstruct the path from the information of its (truncated) signature. In this article, we study the insertion procedure, originally introduced by Chang and Lyons (2019), from both a theoretical and a practical point of view. After describing our version of the method, we give its rate of convergence for piecewise linear paths, accompanied by an implementation in Pytorch. The algorithm is parallelized, meaning that it is very efficient at inverting a batch of signatures simultaneously. Its performance is illustrated with both real-world and simulated examples

    Apprentissage de données temporelles par des méthodes de signatures

    No full text
    Les applications modernes de l’intelligence artificielle amènent à travailler avec des données temporelles multivariées de grande dimension qui posent de nombreux défis. Par une approche géométrique des flux de données, la notion de signature, représentation d’un processus en un vecteur infini de ses intégrales itérées, est un outil prometteur. Ses propriétés développées dans le cadre de la théorie des chemins rugueux en font en effet un bon candidat pour jouer le rôle de features, ensuite injectées dans des algorithmes d’apprentissage. Si la définition de la signature remonte aux travaux de Chen (1960), son utilisation en apprentissage est récente et de nombreuses questions théoriques et méthodologiques restent à explorer. Nous nous intéressons donc à l'utilisation de la signature pour développer des algorithmes génériques et performants pour les données temporelles de grande dimension, ainsi que de leur fournir des garanties théoriques. Ce but se déploie principalement dans deux directions : d’une part, développer de nouveaux algorithmes prenant en entrée la signature des données, d’autre part utiliser la signature comme un outil théorique pour étudier les algorithmes existants d’apprentissage profond, via la notion récente de neural ordinary differential equation qui fait le lien entre apprentissage profond et équations différentielles.Modern applications of artificial intelligence lead to high-dimensional multivariate temporal data that pose many challenges. Through a geometric approach to data flows, the notion of signature, a representation of a process as an infinite vector of its iterated integrals, is a promising tool. Its properties, developed in the context of rough path theory, make it a good candidate to play the role of features, then injected in learning algorithms. If the definition of the signature goes back to the work of Chen (1960), its use in machine learning is recent. Many theoretical and methodological questions remain to be explored. We are therefore interested in using the signature to develop generic and efficient algorithms for high-dimensional temporal data, with theoretical guarantees. This goal is mainly deployed in two directions: on the one hand, to develop new algorithms taking the signature of the data as input, and, on the other hand, to use the signature as a theoretical tool to study existing deep learning algorithms, via the recent notion of neural ordinary differential equation which makes the link between deep learning and differential equations

    Apprentissage de données temporelles par des méthodes de signatures

    No full text
    Modern applications of artificial intelligence lead to high-dimensional multivariate temporal data that pose many challenges. Through a geometric approach to data flows, the notion of signature, a representation of a process as an infinite vector of its iterated integrals, is a promising tool. Its properties, developed in the context of rough path theory, make it a good candidate to play the role of features, then injected in learning algorithms. If the definition of the signature goes back to the work of Chen (1960), its use in machine learning is recent. Many theoretical and methodological questions remain to be explored. We are therefore interested in using the signature to develop generic and efficient algorithms for high-dimensional temporal data, with theoretical guarantees. This goal is mainly deployed in two directions: on the one hand, to develop new algorithms taking the signature of the data as input, and, on the other hand, to use the signature as a theoretical tool to study existing deep learning algorithms, via the recent notion of neural ordinary differential equation which makes the link between deep learning and differential equations.Les applications modernes de l’intelligence artificielle amènent à travailler avec des données temporelles multivariées de grande dimension qui posent de nombreux défis. Par une approche géométrique des flux de données, la notion de signature, représentation d’un processus en un vecteur infini de ses intégrales itérées, est un outil prometteur. Ses propriétés développées dans le cadre de la théorie des chemins rugueux en font en effet un bon candidat pour jouer le rôle de features, ensuite injectées dans des algorithmes d’apprentissage. Si la définition de la signature remonte aux travaux de Chen (1960), son utilisation en apprentissage est récente et de nombreuses questions théoriques et méthodologiques restent à explorer. Nous nous intéressons donc à l'utilisation de la signature pour développer des algorithmes génériques et performants pour les données temporelles de grande dimension, ainsi que de leur fournir des garanties théoriques. Ce but se déploie principalement dans deux directions : d’une part, développer de nouveaux algorithmes prenant en entrée la signature des données, d’autre part utiliser la signature comme un outil théorique pour étudier les algorithmes existants d’apprentissage profond, via la notion récente de neural ordinary differential equation qui fait le lien entre apprentissage profond et équations différentielles

    Embedding and learning with signatures

    No full text
    Sequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. The present article is concerned with a novel approach for sequential learning, called the signature method, and rooted in rough path theory. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. This approach relies critically on an embedding principle, which consists in representing discretely sampled data as paths, i.e., functions from [0,1] to R^d. After a survey of machine learning methodologies for signatures, we investigate the influence of embeddings on prediction accuracy with an in-depth study of three recent and challenging datasets. We show that a specific embedding, called lead-lag, is systematically better, whatever the dataset or algorithm used. Moreover, we emphasize through an empirical study that computing signatures over the whole path domain does not lead to a loss of local information. We conclude that, with a good embedding, the signature combined with a simple algorithm achieves results competitive with state-of-the-art, domain-specific approaches

    The insertion method to invert the signature of a path

    No full text
    The signature is a representation of a path as an infinite sequence of its iterated integrals. Under certain assumptions, the signature characterizes the path, up to translation and reparameterization. Therefore, a crucial question of interest is the development of efficient algorithms to invert the signature, i.e., to reconstruct the path from the information of its (truncated) signature. In this article, we study the insertion procedure, originally introduced by Chang and Lyons (2019), from both a theoretical and a practical point of view. After describing our version of the method, we give its rate of convergence for piecewise linear paths, accompanied by an implementation in Pytorch. The algorithm is parallelized, meaning that it is very efficient at inverting a batch of signatures simultaneously. Its performance is illustrated with both real-world and simulated examples

    New Directions in the Applications of Rough Path Theory

    No full text
    International audienceThis article provides a concise overview of some of the recent advances in the application of rough path theory to machine learning. Controlled differential equations (CDEs) are discussed as the key mathematical model to describe the interaction of a stream with a physical control system. A collection of iterated integrals known as the signature naturally arises in the description of the response produced by such interactions. The signature comes equipped with a variety of powerful properties rendering it an ideal feature map for streamed data. We summarise recent advances in the symbiosis between deep learning and CDEs, studying the link with RNNs and culminating with the Neural CDE model. We concluded with a discussion on signature kernel methods
    corecore