745 research outputs found

    No More Pesky Learning Rates

    Full text link
    The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively removes the need for learning rate tuning

    Learning Learning Algorithms

    Get PDF
    Machine learning models rely on data to learn any given task and depending on the universal diversity of each of the elements of the task and the design objectives, multiple data may be required for better performance, which in turn could exponentially increase learning time and computational cost. Although most of the training of machine learning models today are done using GPUs (Graphics Processing Unit) to speed up the training process, most however, depending on the dataset, still require a huge amount of training time to attain good performance. This study aims to look into learning learning algorithms or popularly known as metalearning which is a method that not only tries to improve the learning speed but also the model performance and in addition it requires fewer data and entails multiple tasks. The concept involves training a model that constantly learns to learn novel tasks at a fast rate from previously learned tasks. For the review of the related work, attention will be given to optimization-based methods and most precisely MAML (Model Agnostic MetaLearning), because first of all, it is one of the most popular state-of-the-art metalearning method, and second of all, this thesis focuses on creating a MAML based method called MAML-DBL that uses an adaptive learning rate technique with dynamic bounds that enables it to attain quick convergence at the beginning of the training process and good generalization towards the end. The proposed MAML variant aims to try to prevent vanishing learning rates during training and slowing down at the end where dense features are prevalent, although further hyperparameter tunning might be necessary for some models or where sparse features may be prevalent, for improved performance. MAML-DBL and MAML, were tested on the most commonly used datasets for metalearning models, and based on the results of the experiments, the proposed method showed a rather competitive performance on some of the models and even outperformed the baseline in some of the carried out tests. The results obtained with both MAML-DBL (in one of the dataset) and MAML, show that metalearning methods are highly recommendable solutions whenever good performance, less data and a multi-task or versatile model are required or desired.Os modelos de aprendizagem automática dependem dos dados para aprender qualquer tarefa e, dependendo da diversidade de cada um dos elementos da tarefa e dos objetivos do projeto, a quantidade de dados pode ser elevada, o que, por sua vez, pode aumentar exponencialmente o tempo de aprendizagem e o custo computacional. Embora a maioria do treino dos modelos de aprendizagem automática hoje seja feito usando GPUs (unidade de processamento gráfico), ainda é necessária uma quantidade enorme de tempo de treino para obter o desempenho desejado. Este trabalho tem como objetivo analisar os algoritmos de aprendizagem de aprendizagem ou popularmente conhecidos como metalearning, que são métodos que não apenas tentam melhorar a velocidade de aprendizagem, mas também o desempenho do modelo e, além disso, requerem menos dados e envolvem várias tarefas. O conceito envolve o treino de um modelo que aprende constantemente a aprender tarefas novas em ritmo acelerado, a partir de tarefas aprendidas anteriormente. Para a revisão do trabalho relacionado, será dada atenção aos métodos baseados em otimização e, mais precisamente, ao MAML (Model Agnostic MetaLearning), porque em primeiro lugar é um dos métodos de metalearning mais populares e em segundo lugar, esta tese foca a criação de um método baseado em MAML, chamado MAML-DBL, que usa uma técnica de taxa de aprendizagem adaptável com limites dinâmicos que permite obter convergência rápida no início do processo de treino e boa generalização no fim. A proposta variante de MAML tem como objetivo tentar evitar o desaparecimento das taxas de aprendizagem durante o treino e a desaceleração no fim onde entradas densas são predominantes, embora possa ser necessário um ajuste adicional dos hiperparâmetros para alguns modelos ou onde entradas esparsas podem ser predominantes, para melhorar o desempenho. O MAML-DBL e o MAML foram testados nos conjuntos de dados mais comumente usados para modelos de metalearning, e com base nos resultados das experiências, o método proposto mostrou um desempenho bastante competitivo em alguns dos modelos e até superou o baseline em alguns dos testes realizados. Os resultados obtidos com o MAML e MAML-DBL (num dos conjuntos de dados) mostram que os métodos de metalearning são soluções altamente recomendáveis sempre que um bom desempenho, menos dados e um modelo versátil ou com várias tarefas são necessários ou desejados

    Metaheuristic design of feedforward neural networks: a review of two decades of research

    Get PDF
    Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era
    corecore