320 research outputs found

    Reparametrization in deep learning

    Full text link
    L'apprentissage profond est une approche connectioniste à l'apprentissage automatique. Elle a pu exploiter la récente production massive de données numériques et l'explosion de la quantité de ressources computationelles qu'a amené ces dernières décennies. La conception d'algorithmes d'apprentissage profond repose sur trois facteurs essentiels: l'expressivité, la recherche efficace de solution, et la généralisation des solutions apprises. Nous explorerons dans cette thèse ces thèmes du point de vue de la reparamétrisation. Plus précisement, le chapitre 3 s'attaque à une conjecture populaire, selon laquelle les énormes réseaux de neurones ont pu apprendre, parmi tant de solutions possibles, celle qui généralise parce que les minima atteints sont plats. Nous démontrons les lacunes profondes de cette conjecture par reparamétrisation sur des exemples simples de modèles populaires, ce qui nous amène à nous interroger sur les interprétations qu'ont superposées précédents chercheurs sur plusieurs phénomènes précédemment observés. Enfin, le chapitre 5 enquête sur le principe d'analyse non-linéaire en composantes indépendantes permettant une formulation analytique de la densité d'un modèle par changement de variable. En particulier, nous proposons l'architecture Real NVP qui utilise de puissantes fonctions paramétriques et aisément inversible que nous pouvons simplement entraîner par descente de gradient. Nous indiquons les points forts et les points faibles de ce genre d'approches et expliquons les algorithmes développés durant ce travail.Deep learning is a connectionist approach to machine learning that successfully harnessed our massive production of data and recent increase in computational resources. In designing efficient deep learning algorithms come three principal themes: expressivity, trainability, and generalizability. We will explore in this thesis these questions through the point of view of reparametrization. In particular, chapter 3 confronts a popular conjecture in deep learning attempting to explain why large neural network are learning among many plausible hypotheses one that generalize: flat minima reached through learning generalize better. We demonstrate the serious limitations this conjecture encounters by reparametrization on several simple and popular models and interrogate the interpretations put on experimental observations. Chapter 5 explores the framework of nonlinear independent components enabling closed form density evaluation through change of variable. More precisely, this work proposes Real NVP, an architecture using expressive and easily invertible computational layers trainable by standard gradient descent algorithms. We showcase its successes and shortcomings in modelling high dimensional data, and explain the techniques developed in that design

    Comparison of data mining techniques to predict and map the Atterberg limits in central plateau of Iran

    Get PDF
    The Atterberg limits display soil mechanical behavior and, therefore, can be so important for topics related to soil management. The aim of the research was to investigate the spatial variability of the Atterberg limits using three most common digital soil-mapping techniques, the pool of easy-to-obtain environmental variables and 85 soil samples in central Iran. The results showed that the maximum amount of liquid limit (LL) and plastic limit (PL) were obtained in the central, eastern and southeastern parts of the study area where the soil textural classes were loam and clay loam. The minimum amount of LL and PL were related to the northwestern parts of the study area, adjacent to the mountain regions, where the samples had high levels of sand content (>80%). The ranges of plasticity index (PI) in the study area were obtained between 0.01 to 4%. According to the leave-in-out cross-validation method, it should be highlighted the combination of artifiial bee colony algorithm (ABC) and artifiial neural network (ANN) techniques were the best model to predict the Atterberg limits in the study area, compared to the support vector machine and regression tree model. For instance, ABC-ANN could predict PI with RMSE, R2 and ME of 0.23, 0.91 and -0.03, respectively. Our fiding generally indicated that the proposed method can explain the most of variations of the Atterberg limits in the study area, and it could berecommended, therefore, as an indirect approach to assess soil mechanical properties in the arid regions, where the soil survey/sampling is difficult to undertake

    Penerapan Algoritma Optimasi Chaos pada Jaringan Ridge Polynomial untuk Prediksi Jumlah Pengangguran

    Get PDF
    AbstrakRidge polynomial neural network (RPNN) awalnya diusulkan oleh Shin dan Ghosh, dibangun dari jumlah peningkatan order pi-sigma neuron (PSN). RPNN mempertahankan pembelajaran cepat, pemetaan yang kuat dari layer tunggal higher order neural network (HONN) dan menghindari banyaknya bobot karena meningkatnya sejumlah input. Algoritma optimasi chaos digunakan dengan memanfaatkan persamaan logistik yang sensitif terhadap kondisi awal, sehingga pergerakan chaos dapat berubah di setiap keadaan dalam skala tertentu menurut keteraturan, ergodik dan mempertahankan keragaman solusi.Algoritma Optimasi Chaos diterapkan pada RPNN dan digunakan untuk prediksi jumlah pengangguran di Kalimantan Barat. Proses pelatihan jaringan menggunakan ridge polynomial neural network, sedangkan pencarian nilai awal bobot dan bias jaringan menggunakan algoritma optimasi chaos. Struktur yang digunakan terdiri dari 6 neuron layer input dan 1 neuron layer output. Data diperoleh dari Badan Pusat Statistik.Hasil dari penelitian ini menunjukkan bahwa algoritma yang diusulkan dapat digunakan untuk prediksi. Kata kunci—prediksi jumlah pengangguran, jaringan syaraf tiruan, algoritma optimasi chaos, ridge polynomial neural network  Abstract Ridge polynomial neural network was initially proposed by Shin and Ghosh, made of total increased pi-sigma neural (PSN) orders. Ridge polynomial neural network maintains quick learning, strong mapping of single layer of higher order neural network (HONN) and avoids many weights because total increased inputs. Chaos optimization algorithm is used by utilizing sensitive logistic equation to initial condition, so that chaos movement can change in each condition in specific scale according to orderliness, ergodic, and maintaining solution variety.             Chaos optimization algorithm is applied to ridge polynomial neural network and used to predict total unemployed persons in West Kalimantan. Network training process used ridge polynomial neural network; while, initial values and weights and bias of network were found using Chaos optimization algorithm. Structure used consisted of 6 input layer neurons and one output layer neuron. Data were obtained from Central Statistic Agency.            The results of research indicated that algorithm proposed could be used to predict Keywords— predict the number of unemployed, neural networks, chaos optimization algorithm, ridge polynomial neural networ

    A contribution to exchange rate forecasting based on machine learning techniques

    Get PDF
    El propòsit d'aquesta tesi és examinar les aportacions a l'estudi de la predicció de la taxa de canvi basada en l'ús de tècniques d'aprenentatge automàtic. Aquestes aportacions es veuen facilitades i millorades per l'ús de variables econòmiques, indicadors tècnics i variables de tipus ‘business and consumer survey’. Aquesta investigació s’organitza entorn d’una recopilació de quatre articles. L'objectiu de cadascun dels quatre treballs de recerca d'aquesta tesi és el de contribuir a l'avanç del coneixement sobre els efectes i mecanismes mitjançant els quals l'ús de variables econòmiques, indicadors tècnics, variables de tipus ‘business and consumer survey’, i la selecció dels paràmetres de models predictius són capaços de millorar les prediccions de la taxa de canvi. Fent ús d'una tècnica de predicció no lineal, el primer article d'aquesta tesi es centra majoritàriament en l'impacte que tenen l'ús de variables econòmiques i la selecció dels paràmetres dels models en les prediccions de la taxa de canvi per a dos països. L'últim experiment d'aquest primer article fa ús de la taxa de canvi del període anterior i d'indicadors econòmics com a variables d'entrada en els models predictius. El segon article d'aquesta tesi analitza com la combinació de mitjanes mòbils, variables de tipus ‘business and consumer survey’ i la selecció dels paràmetres dels models milloren les prediccions del canvi per a dos països. A diferència del primer article, aquest segon treball de recerca afegeix mitjanes mòbils i variables de tipus ‘business and consumer survey’ com a variables d'entrada en els models predictius, i descarta l'ús de variables econòmiques. Un dels objectius d'aquest segon article és determinar el possible impacte de les variables de tipus ‘business and consumer survey’ en les taxes de canvi. El tercer article d'aquesta tesi té els mateixos objectius que el segon, però amb l'excepció que l'anàlisi abasta les taxes de canvi de set països. El quart article de la tesi compta amb els mateixos objectius que l'article anterior, però amb la diferència que fa ús d'un sol indicador tècnic. En general, l'enfocament d'aquesta tesi pretén examinar diferents alternatives per a millorar les prediccions del tipus de canvi a través de l'ús de màquines de suport vectorial. Una combinació de variables i la selecció dels paràmetres dels models predictius ajudaran a aconseguir aquest propòsit.El propósito de esta tesis es examinar las aportaciones al estudio de la predicción de la tasa de cambio basada en el uso de técnicas de aprendizaje automático. Dichas aportaciones se ven facilitadas y mejoradas por el uso de variables económicas, indicadores técnicos y variables de tipo ‘business and consumer survey’. Esta investigación está organizada en un compendio de cuatro artículos. El objetivo de cada uno de los cuatro trabajos de investigación de esta tesis es el de contribuir al avance del conocimiento sobre los efectos y mecanismos mediante los cuales el uso de variables económicas, indicadores técnicos, variables de tipo ‘business and consumer survey’, y la selección de los parámetros de modelos predictivos son capaces de mejorar las predicciones de la tasa de cambio. Haciendo uso de una técnica de predicción no lineal, el primer artículo de esta tesis se centra mayoritariamente en el impacto que tienen el uso de variables económicas y la selección de los parámetros de los modelos en las predicciones de la tasa de cambio para dos países. El último experimento de este primer artículo hace uso de la tasa de cambio del periodo anterior y de indicadores económicos como variables de entrada en los modelos predictivos. El segundo artículo de esta tesis analiza cómo la combinación de medias móviles, variables de tipo ‘business and consumer survey’ y la selección de los parámetros de los modelos mejoran las predicciones del cambio para dos países. A diferencia del primer artículo, este segundo trabajo de investigación añade medias móviles y variables de tipo ‘business and consumer survey’ como variables de entrada en los modelos predictivos, y descarta el uso de variables económicas. Uno de los objetivos de este segundo artículo es determinar el posible impacto de las variables de tipo ‘business and consumer survey’ en las tasas de cambio. El tercer artículo de esta tesis tiene los mismos objetivos que el segundo, pero con la salvedad de que el análisis abarca las tasas de cambio de siete países. El cuarto artículo de esta tesis cuenta con los mismos objetivos que el artículo anterior, pero con la diferencia de que hace uso de un solo indicador técnico. En general, el enfoque de esta tesis pretende examinar diferentes alternativas para mejorar las predicciones del tipo de cambio a través del uso de máquinas de soporte vectorial. Una combinación de variables y la selección de los parámetros de los modelos predictivos ayudarán a conseguir este propósito.The purpose of this thesis is to examine the contribution made by machine learning techniques on exchange rate forecasting. Such contributions are facilitated and enhanced by the use of fundamental economic variables, technical indicators and business and consumer survey variables as inputs in the forecasting models selected. This research has been organized in a compendium of four articles. The aim of each of these four articles is to contribute to advance our knowledge on the effects and means by which the use of fundamental economic variables, technical indicators, business and consumer surveys, and a model’s free-parameters selection is capable of improving exchange rate predictions. Through the use of a non-linear forecasting technique, one research paper examines the effect of fundamental economic variables and a model’s parameters selection on exchange rate forecasts, whereas the other three articles concentrate on the effect of technical indicators, a model’s parameters selection and business and consumer surveys variables on exchange rate forecasting. The first paper of this thesis has the objective of examining fundamental economic variables and a forecasting model’s parameters in an effort to understand the possible advantages or disadvantages these variables may bring to the exchange rate predictions in terms of forecasting performance and accuracy. The second paper of this thesis analyses how the combination of moving averages, business and consumer surveys and a forecasting model’s parameters improves exchange rate predictions. Compared to the first paper, this second paper adds moving averages and business and consumer surveys variables as inputs to the forecasting model, and disregards the use of fundamental economic variables. One of the goals of this paper is to determine the possible effects of business and consumer surveys on exchange rates. The third paper of this thesis has the same objectives as the second paper, but its analysis is expanded by taking into account the exchange rates of 7 countries. The fourth paper in this thesis takes a similar approach as the second and third papers, but makes use of a single technical indicator. In general, this thesis focuses on the improvement of exchange rate predictions through the use of support vector machines. A combination of variables and a model’s parameters selection enhances the way to achieve this purpose

    Support Vector Regression for Non-Stationary Time Series

    Get PDF
    The difficulty associated with building forecasting models for non-stationary and volatile data has necessitated the development and application of new sophisticated techniques that can handle such data. Interestingly, there are a lot of real-world phenomena where data that are “difficult to analyze” are generated. One of these is the stock market where data series generated are often hard to forecast because of their peculiar characteristics. In particular, the stock market has been referred to as a complex environment and financial time series forecasting is often tagged as the most challenging application of time series forecasting. In this study, a novel approach known as Support Vector Regression (SVR) for forecasting non-stationary time series was adopted and the feasibility of applying this method to five financial time series was examined. Prior to implementing the SVR algorithm, three different methods of transformation namely Relative Difference in Percentages (RDP), Z-score and Natural Logarithm transformations were applied to the data series and the best prediction results obtained along with the associated transformation technique was presented. Our study indicated that the Z-score transformation is the best scaling method for financial time series, exhibiting superior performance than the other two transformations on the basis of five different performance measures. To determine the optimum values of the SVR parameters, a cross-validation method was implemented. For this purpose, the value of C and ε was varied from 5 to 100, and 0.001 and 0.1 respectively. The cross-validation method, though computationally expensive, is better than other proposed techniques for determining the values of these parameters. Another highlight of this study is the comparison of the SVR results to that obtained using 5-day Simple Moving Averages (SMA). The SMA was selected as a comparative method because it has been identified as the most popular quantitative forecasting method used by US corporations. Discussions with financial analysts also suggest that the SMA is one of the widely used in the financial industry. The popularity of the SMA can be explained by the fact that it is easy and cheap to use and it produces forecasts that can be easily interpreted by econometricians and other interested practitioners

    Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution

    Full text link
    This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise and the true linear model is captured via an NN-layer linear neural network. On the negative side, we show that this problem \textit{does not} have a benign landscape: given any N1N\geq 1, with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any NN-layer model with N2N\geq 2, a simple sub-gradient method becomes oblivious to such ``problematic'' solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape, thereby eschewing the need for "early stopping". Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with 1\ell_1-loss

    What Matters in Model Training to Transfer Adversarial Examples

    Get PDF
    Despite state-of-the-art performance on natural data, Deep Neural Networks (DNNs) are highly vulnerable to adversarial examples, i.e., imperceptible, carefully crafted perturbations of inputs applied at test time. Adversarial examples can transfer: an adversarial example against one model is likely to be adversarial against another independently trained model. This dissertation investigates the characteristics of the surrogate weight space that lead to the transferability of adversarial examples. Our research covers three complementary aspects of the weight space exploration: the multimodal exploration to obtain multiple models from different vicinities, the local exploration to obtain multiple models in the same vicinity, and the point selection to obtain a single transferable representation. First, from a probabilistic perspective, we argue that transferability is fundamentally related to uncertainty. The unknown weights of the target DNN can be treated as random variables. Under a specified threat model, deep ensemble can produce a surrogate by sampling from the distribution of the target model. Unfortunately, deep ensembles are computationally expensive. We propose an efficient alternative by approximately sampling surrogate models from the posterior distribution using cSGLD, a state-of-the-art Bayesian deep learning technique. Our extensive experiments show that our approach improves and complements four attacks, three transferability techniques, and five more training methods significantly on ImageNet, CIFAR-10, and MNIST (up to 83.2 percentage points), while reducing training computations from 11.6 to 2.4 exaflops compared to deep ensemble on ImageNet. Second, we propose transferability from Large Geometric Vicinity (LGV), a new technique based on the local exploration of the weight space. LGV starts from a pretrained model and collects multiple weights in a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, we show that LGV explores a flatter region of the weight space and generates flatter adversarial examples in the input space. We present the surrogate-target misalignment hypothesis to explain why flatness could increase transferability. Second, we show that the LGV weights span a dense weight subspace whose geometry is intrinsically connected to transferability. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established transferability techniques by 1.8 to 59.9 percentage points. Third, we investigate how to train a transferable representation, that is, a single model for transferability. First, we refute a common hypothesis from previous research to explain why early stopping improves transferability. We then establish links between transferability and the exploration dynamics of the weight space, in which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach to transferability that minimises the sharpness of the loss during training. We show that by searching for large flat neighbourhoods, RFN always improves over early stopping (by up to 47 points of success rate) and is competitive to (if not better than) strong state-of-the-art baselines. Overall, our three complementary techniques provide an extensive and practical method to obtain highly transferable adversarial examples from the multimodal and local exploration of flatter vicinities in the weight space. Our probabilistic and geometric approaches demonstrate that the way to train the surrogate model has been overlooked, although both the training noise and the flatness of the loss landscape are important elements of transfer-based attacks

    Dynamical Systems

    Get PDF
    Complex systems are pervasive in many areas of science integrated in our daily lives. Examples include financial markets, highway transportation networks, telecommunication networks, world and country economies, social networks, immunological systems, living organisms, computational systems and electrical and mechanical structures. Complex systems are often composed of a large number of interconnected and interacting entities, exhibiting much richer global scale dynamics than the properties and behavior of individual entities. Complex systems are studied in many areas of natural sciences, social sciences, engineering and mathematical sciences. This special issue therefore intends to contribute towards the dissemination of the multifaceted concepts in accepted use by the scientific community. We hope readers enjoy this pertinent selection of papers which represents relevant examples of the state of the art in present day research. [...
    corecore