175 research outputs found

    SVR, General Noise Functions and Deep Learning. General Noise Deep Models

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de Lectura: 20-01-2023El aprendizaje automático, ML por sus siglas en inglés, es una rama de la inteligencia artifcial que permite construir sistemas que aprendan a resolver una tarea automáticamente a partir de los datos, en el sentido de que no necesitan ser programados explícitamente con las reglas o el método para hacerlo. ML abarca diferentes tipos de problemas; Uno de ellos, la regresión, implica predecir un resultado numérico y será el foco de atención de esta tesis. Entre los modelos ML utilizados para la regresión, las máquinas de vectores soporte o Support Vector Machines, SVM, son uno de los principales algoritmos de eleccón, habitualmente llamado Support Vector Regression, SVR, cuando se aplica a tareas de regresión. Este tipo de modelos generalmente emplea la función de pérdida ϵ−insensitive, lo que implica asumir una distribución concreta en el ruido presente en los datos, pero recientemente se han propuesto funciones de coste de ruido general para SVR. Estas funciones de coste deberían ser más efectivas cuando se aplican a problemas de regresión cuya distribución de ruido subyacente sigue la asumida para esa función de coste particular. Sin embargo, el uso de estas funciones generales, con la disparidad en las propiedades matemáticas como la diferenciabilidad que implica, hace que el método de optimización estándar utilizado en SVR, optimización mínima secuencial o SMO, ya no sea una posibilidad. Además, posiblemente el principal inconveniente de los modelos SVR es que pueden sufrir problemas de escalabilidad al trabajar con datos de gran tamaño, una situación común en la era de los grandes datos. Por otro lado, los modelos de Aprendizaje Profundo o Deep Learning, DL, pueden manejar grandes conjuntos de datos con mayor facilidad, siendo esta una de las razones fundamentales para explicar su reciente popularidad. Finalmente, aunque los modelos SVR se han estudiado a fondo, la construcción de intervalos de error para ellos parece haber recibido menos atención y sigue siendo un problema sin resolver. Esta es una desventaja signifcativa, ya que en muchas aplicaciones que implican resolver un problema de regresión no solo es util una predicción precisa, sino que también un intervalo de confianza asociado a esta predicción puede ser extremadamente valioso. Teniendo en cuenta todos estos factores, esta tesis tiene cuatro objetivos principales: Primero, proponer un marco para entrenar Modelos SVR de ruido general utilizando como método de optimización Naive Online R Minimization Algorithm, NORMA. En segundo lugar, proporcionar un método para construir modelos DL de ruido general que combinen el procesamiento de características altamente no lineales de los modelos DL con el potencial predictivo de usar funciones de pérdida de ruido general, de las cuales la función de pérdida ϵ−insensitive utilizada en SVR es solo un ejemplo particular. Tercero, describir un enfoque directo para construir intervalos de error para SVR u otros modelos de regresión, basado en asumir la hipótesis de que los residuos siguen una función de distribución concreta. Y finalmente, unificar los tres objetivos anteriores en un marco de modelos unico que permita construir modelos profundos de ruido general para la predicción en problemas de regresión con la posibilidad de obtener intervalos de confianza o intervalos de error asociado

    Taxonomy of datasets in graph learning : a data-driven approach to improve GNN benchmarking

    Full text link
    The core research of this thesis, mostly comprising chapter four, has been accepted to the Learning on Graphs (LoG) 2022 conference for a spotlight presentation as a standalone paper, under the title "Taxonomy of Benchmarks in Graph Representation Learning", and is to be published in the Proceedings of Machine Learning Research (PMLR) series. As a main author of the paper, my specific contributions to this paper cover problem formulation, design and implementation of our taxonomy framework and experimental pipeline, collation of our results and of course the writing of the article.L'apprentissage profond sur les graphes a atteint des niveaux de succès sans précédent ces dernières années grâce aux réseaux de neurones de graphes (GNN), des architectures de réseaux de neurones spécialisées qui ont sans équivoque surpassé les approches antérieurs d'apprentissage définies sur des graphes. Les GNN étendent le succès des réseaux de neurones aux données structurées en graphes en tenant compte de leur géométrie intrinsèque. Bien que des recherches approfondies aient été effectuées sur le développement de GNN avec des performances supérieures à celles des modèles références d'apprentissage de représentation graphique, les procédures d'analyse comparative actuelles sont insuffisantes pour fournir des évaluations justes et efficaces des modèles GNN. Le problème peut-être le plus répandu et en même temps le moins compris en ce qui concerne l'analyse comparative des graphiques est la "couverture de domaine": malgré le nombre croissant d'ensembles de données graphiques disponibles, la plupart d'entre eux ne fournissent pas d'informations supplémentaires et au contraire renforcent les biais potentiellement nuisibles dans le développement d’un modèle GNN. Ce problème provient d'un manque de compréhension en ce qui concerne les aspects d'un modèle donné qui sont sondés par les ensembles de données de graphes. Par exemple, dans quelle mesure testent-ils la capacité d'un modèle à tirer parti de la structure du graphe par rapport aux fonctionnalités des nœuds? Ici, nous développons une approche fondée sur des principes pour taxonomiser les ensembles de données d'analyse comparative selon un "profil de sensibilité" qui est basé sur la quantité de changement de performance du GNN en raison d'une collection de perturbations graphiques. Notre analyse basée sur les données permet de mieux comprendre quelles caractéristiques des données de référence sont exploitées par les GNN. Par conséquent, notre taxonomie peut aider à la sélection et au développement de repères graphiques adéquats et à une évaluation mieux informée des futures méthodes GNN. Enfin, notre approche et notre implémentation dans le package GTaxoGym (https://github.com/G-Taxonomy-Workgroup/GTaxoGym) sont extensibles à plusieurs types de tâches de prédiction de graphes et à des futurs ensembles de données.Deep learning on graphs has attained unprecedented levels of success in recent years thanks to Graph Neural Networks (GNNs), specialized neural network architectures that have unequivocally surpassed prior graph learning approaches. GNNs extend the success of neural networks to graph-structured data by accounting for their intrinsic geometry. While extensive research has been done on developing GNNs with superior performance according to a collection of graph representation learning benchmarks, current benchmarking procedures are insufficient to provide fair and effective evaluations of GNN models. Perhaps the most prevalent and at the same time least understood problem with respect to graph benchmarking is "domain coverage": Despite the growing number of available graph datasets, most of them do not provide additional insights and on the contrary reinforce potentially harmful biases in GNN model development. This problem stems from a lack of understanding with respect to what aspects of a given model are probed by graph datasets. For example, to what extent do they test the ability of a model to leverage graph structure vs. node features? Here, we develop a principled approach to taxonomize benchmarking datasets according to a "sensitivity profile" that is based on how much GNN performance changes due to a collection of graph perturbations. Our data-driven analysis provides a deeper understanding of which benchmarking data characteristics are leveraged by GNNs. Consequently, our taxonomy can aid in selection and development of adequate graph benchmarks, and better informed evaluation of future GNN methods. Finally, our approach and implementation in the GTaxoGym package (https://github.com/G-Taxonomy-Workgroup/GTaxoGym) are extendable to multiple graph prediction task types and future datasets

    Spatiotemporal convolutional network for time-series prediction and causal inference

    Full text link
    Making predictions in a robust way is not easy for nonlinear systems. In this work, a neural network computing framework, i.e., a spatiotemporal convolutional network (STCN), was developed to efficiently and accurately render a multistep-ahead prediction of a time series by employing a spatial-temporal information (STI) transformation. The STCN combines the advantages of both the temporal convolutional network (TCN) and the STI equation, which maps the high-dimensional/spatial data to the future temporal values of a target variable, thus naturally providing the prediction of the target variable. From the observed variables, the STCN also infers the causal factors of the target variable in the sense of Granger causality, which are in turn selected as effective spatial information to improve the prediction robustness. The STCN was successfully applied to both benchmark systems and real-world datasets, all of which show superior and robust performance in multistep-ahead prediction, even when the data were perturbed by noise. From both theoretical and computational viewpoints, the STCN has great potential in practical applications in artificial intelligence (AI) or machine learning fields as a model-free method based only on the observed data, and also opens a new way to explore the observed high-dimensional data in a dynamical manner for machine learning.Comment: 23 pages, 6 figure

    Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

    Full text link
    Three recent breakthroughs due to AI in arts and science serve as motivation: An award winning digital image, protein folding, fast matrix multiplication. Many recent developments in artificial neural networks, particularly deep learning (DL), applied and relevant to computational mechanics (solid, fluids, finite-element technology) are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. Hybrid methods combine traditional PDE discretizations with ML methods either (1) to help model complex nonlinear constitutive relations, (2) to nonlinearly reduce the model order for efficient simulation (turbulence), or (3) to accelerate the simulation by predicting certain components in the traditional integration methods. Here, methods (1) and (2) relied on Long-Short-Term Memory (LSTM) architecture, with method (3) relying on convolutional neural networks. Pure ML methods to solve (nonlinear) PDEs are represented by Physics-Informed Neural network (PINN) methods, which could be combined with attention mechanism to address discontinuous solutions. Both LSTM and attention architectures, together with modern and generalized classic optimizers to include stochasticity for DL networks, are extensively reviewed. Kernel machines, including Gaussian processes, are provided to sufficient depth for more advanced works such as shallow networks with infinite width. Not only addressing experts, readers are assumed familiar with computational mechanics, but not with DL, whose concepts and applications are built up from the basics, aiming at bringing first-time learners quickly to the forefront of research. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics, even in well-known references. Positioning and pointing control of a large-deformable beam is given as an example.Comment: 275 pages, 158 figures. Appeared online on 2023.03.01 at CMES-Computer Modeling in Engineering & Science

    Short Term Electricity Price Forecasting With Multistage Optimization Technique Of LSSVM-GA

    Get PDF
    Price prediction has now become an important task in the operation of electrical power system.In short term forecast,electricity price can be predicted for an hour-ahead or day-ahead.An hour-ahead prediction offers the market members with the pre-dispatch prices for the next hour.It is useful for an effective bidding strategy where the quantity of bids can be revised or changed prior to the dispatch hour.However,only a few studies have been conducted in the field of hour-ahead forecasting.This is due to most of the power markets apply two-settlement market structure (day-ahead and real time) or standard market design rather than singlesettlement system (real time).Therefore,a multistage optimization for hybrid Least Square Support Vector Machine (LSSVM) and Genetic Algorithm (GA) model is developed in this study to provide an accurate price forecast with optimized parameters and input features.So far,no literature has been found on multistage feature and parameter selections using the methods of LSSVM-GA for hour-ahead price prediction.All the models are examined on the Ontario power market;which is reported as among the most volatile market worldwide.A huge number of features are selected by three stages of optimization to avoid from missing any important features.The developed LSSVM-GA shows higher forecast accuracy with lower complexity than the existing models

    Short Term Electricity Price Forecasting with Multistage Optimization Technique of LSSVM-GA

    Get PDF
    Price prediction has now become an important task in the operation of electrical power system. In short term forecast, electricity price can be predicted for an hour-ahead or day-ahead. An hour-ahead prediction offers the market members with the pre-dispatch prices for the next hour. It is useful for an effective bidding strategy where the quantity of bids can be revised or changed prior to the dispatch hour. However, only a few studies have been conducted in the field of hour-ahead forecasting. This is due to most of the power markets apply two-settlement market structure (day-ahead and real time) or standard market design rather than singlesettlement system (real time). Therefore, a multistage optimization for hybrid Least Square Support Vector Machine (LSSVM) and Genetic Algorithm (GA) model is developed in this study to provide an accurate price forecast with optimized parameters and input features. So far, no literature has been found on multistage feature and parameter selections using the methods of LSSVM-GA for hour-ahead price prediction. All the models are examined on the Ontario power market; which is reported as among the most volatile market worldwide. A huge number of features are selected by three stages of optimization to avoid from missing any important features. The developed LSSVM-GA shows higher forecast accuracy with lower complexity than the existing models

    Deep Learning for Inverse Problems: Performance Characterizations, Learning Algorithms, and Applications

    Get PDF
    Deep learning models have witnessed immense empirical success over the last decade. However, in spite of their widespread adoption, a profound understanding of the generalization behaviour of these over-parameterized architectures is still missing. In this thesis, we provide one such way via a data-dependent characterizations of the generalization capability of deep neural networks based data representations. In particular, by building on the algorithmic robustness framework, we offer a generalisation error bound that encapsulates key ingredients associated with the learning problem such as the complexity of the data space, the cardinality of the training set, and the Lipschitz properties of a deep neural network. We then specialize our analysis to a specific class of model based regression problems, namely the inverse problems. These problems often come with well defined forward operators that map variables of interest to the observations. It is therefore natural to ask whether such knowledge of the forward operator can be exploited in deep learning approaches increasingly used to solve inverse problems. We offer a generalisation error bound that -- apart from the other factors -- depends on the Jacobian of the composition of the forward operator with the neural network. Motivated by our analysis, we then propose a `plug-and-play' regulariser that leverages the knowledge of the forward map to improve the generalization of the network. We likewise also provide a method allowing us to tightly upper bound the norms of the Jacobians of the relevant operators that is much more {computationally} efficient than existing ones. We demonstrate the efficacy of our model-aware regularised deep learning algorithms against other state-of-the-art approaches on inverse problems involving various sub-sampling operators such as those used in classical compressed sensing setup and inverse problems that are of interest in the biomedical imaging setup

    A Machine Learning and Data-Driven Prediction and Inversion of Reservoir Brittleness from Geophysical Logs and Seismic Signals: A Case Study in Southwest Pennsylvania, Central Appalachian Basin

    Get PDF
    In unconventional reservoir sweet-spot identification, brittleness is an important parameter that is used as an easiness measure of production from low permeability reservoirs. In shaly reservoirs, production is realized from hydraulic fracturing, which depends on how brittle the rock is–as it opens natural fractures and also creates new fractures. A measure of brittleness, brittleness index, is obtained through elastic properties of the rock. In practice, problems arise using this method to predict brittleness because of the limited availability of elastic logs. To address this issue, machine learning techniques are adopted to predict brittleness at well locations from readily available geophysical logs and spatially using 3D seismic data. The geophysical logs available as input are gamma ray, neutron, sonic, photoelectric factor, and density logs while the seismic is a post-stack time migrated data of high quality. Support Vector Regression, Gradient Boosting, and Artificial Neural Network are used to predict the brittleness from the geophysical logs and Texture Model Regression to invert the brittleness from the seismic data. The Gradient Boosting outperformed the other algorithms in predicting brittleness. The result of this research further demonstrates the application of machine learning, and how these tools can be leveraged to create data-driven solutions to geophysical problems. Also, the seismic inversion of brittleness shows promising results that will be further investigated in the future

    Comparison of different models for forecasting of Czech electricity market

    Get PDF
    Mnoho rozdílných přístupů jako jsou umělé neuronové sítě nebo SVR bývá použito v literatuře. Tato práce poskytuje srovnání několika rozdílných metod v jednotných podmínkách za použití dat z Českého trhu s elektřinou. Výsledné srovnání více jak 5000 modelů vedlo k vybrání několika nejlepších modelů. Tato práce také vyhodnocuje roli historických meteorologických dat (teplota, rosný bod a vlhkost) - bylo zjištěno, že třebaže použití meteorologických může vést k přeučení, za vhodných podmínek může také vést k přesnějším modelům. Nejlepší testovaný přístup představovala Lasso regrese. 1There is a demand for decision support tools that can model the electricity markets and allows to forecast the hourly electricity price. Many different ap- proach such as artificial neural network or support vector regression are used in the literature. This thesis provides comparison of several different estima- tors under one settings using available data from Czech electricity market. The resulting comparison of over 5000 different estimators led to a selection of several best performing models. The role of historical weather data (temper- ature, dew point and humidity) is also assesed within the comparison and it was found that while the inclusion of weather data might lead to overfitting, it is beneficial under the right circumstances. The best performing approach was the Lasso regression estimated using modified Lars. 1Institut ekonomických studiíInstitute of Economic StudiesFaculty of Social SciencesFakulta sociálních vě

    Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning

    Full text link
    We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization methods for minimizing empirical loss functions in deep learning, eliminating the need for the user to tune the learning rate (LR). The proposed approach exploits the layer-wise stochastic curvature information contained in the diagonal blocks of the Hessian in deep neural networks (DNNs) to compute adaptive step-sizes (i.e., LRs) for each layer. The method has memory requirements that are comparable to those of first-order methods, while its per-iteration time complexity is only increased by an amount that is roughly equivalent to an additional gradient computation. Numerical experiments show that SGD with momentum and AdamW combined with the proposed per-layer step-sizes are able to choose effective LR schedules and outperform fine-tuned LR versions of these methods as well as popular first-order and second-order algorithms for training DNNs on Autoencoder, Convolutional Neural Network (CNN) and Graph Convolutional Network (GCN) models. Finally, it is proved that an idealized version of SGD with the layer-wise step sizes converges linearly when using full-batch gradients
    corecore