31 research outputs found

    Adaptive Latent Factor Analysis via Generalized Momentum-Incorporated Particle Swarm Optimization

    Full text link
    Stochastic gradient descent (SGD) algorithm is an effective learning strategy to build a latent factor analysis (LFA) model on a high-dimensional and incomplete (HDI) matrix. A particle swarm optimization (PSO) algorithm is commonly adopted to make an SGD-based LFA model's hyper-parameters, i.e, learning rate and regularization coefficient, self-adaptation. However, a standard PSO algorithm may suffer from accuracy loss caused by premature convergence. To address this issue, this paper incorporates more historical information into each particle's evolutionary process for avoiding premature convergence following the principle of a generalized-momentum (GM) method, thereby innovatively achieving a novel GM-incorporated PSO (GM-PSO). With it, a GM-PSO-based LFA (GMPL) model is further achieved to implement efficient self-adaptation of hyper-parameters. The experimental results on three HDI matrices demonstrate that the GMPL model achieves a higher prediction accuracy for missing data estimation in industrial applications

    An Incomplete Tensor Tucker decomposition based Traffic Speed Prediction Method

    Full text link
    In intelligent transport systems, it is common and inevitable with missing data. While complete and valid traffic speed data is of great importance to intelligent transportation systems. A latent factorization-of-tensors (LFT) model is one of the most attractive approaches to solve missing traffic data recovery due to its well-scalability. A LFT model achieves optimization usually via a stochastic gradient descent (SGD) solver, however, the SGD-based LFT suffers from slow convergence. To deal with this issue, this work integrates the unique advantages of the proportional-integral-derivative (PID) controller into a Tucker decomposition based LFT model. It adopts two-fold ideas: a) adopting tucker decomposition to build a LFT model for achieving a better recovery accuracy. b) taking the adjusted instance error based on the PID control theory into the SGD solver to effectively improve convergence rate. Our experimental studies on two major city traffic road speed datasets show that the proposed model achieves significant efficiency gain and highly competitive prediction accuracy

    A Taylor polynomial expansion line search for large-scale optimization

    Get PDF
    In trying to cope with the Big Data deluge, the landscape of distributed computing has changed. Large commodity hardware clusters, typically operating in some form of MapReduce framework, are becoming prevalent for organizations that require both tremendous storage capacity and fault tolerance. However, the high cost of communication can dominate the computation time in large-scale optimization routines in these frameworks. This thesis considers the problem of how to efficiently conduct univariate line searches in commodity clusters in the context of gradient-based batch optimization algorithms, like the staple limited-memory BFGS (LBFGS) method. In it, a new line search technique is proposed for cases where the underlying objective function is analytic, as in logistic regression and low rank matrix factorization. The technique approximates the objective function by a truncated Taylor polynomial along a fixed search direction. The coefficients of this polynomial may be computed efficiently in parallel with far less communication than needed to transmit the high-dimensional gradient vector, after which the polynomial may be minimized with high accuracy in a neighbourhood of the expansion point without distributed operations. This Polynomial Expansion Line Search (PELS) may be invoked iteratively until the expansion point and minimum are sufficiently accurate, and can provide substantial savings in time and communication costs when multiple iterations in the line search procedure are required. Three applications of the PELS technique are presented herein for important classes of analytic functions: (i) logistic regression (LR), (ii) low-rank matrix factorization (MF) models, and (iii) the feedforward multilayer perceptron (MLP). In addition, for LR and MF, implementations of PELS in the Apache Spark framework for fault-tolerant cluster computing are provided. These implementations conferred significant convergence enhancements to their respective algorithms, and will be of interest to Spark and Hadoop practitioners. For instance, the Spark PELS technique reduced the number of iterations and time required by LBFGS to reach terminal training accuracies for LR models by factors of 1.8--2. Substantial acceleration was also observed for the Nonlinear Conjugate Gradient algorithm for MLP models, which is an interesting case for future study in optimization for neural networks. The PELS technique is applicable to a broad class of models for Big Data processing and large-scale optimization, and can be a useful component of batch optimization routines

    Random Projection in Deep Neural Networks

    Get PDF
    This work investigates the ways in which deep learning methods can benefit from random projection (RP), a classic linear dimensionality reduction method. We focus on two areas where, as we have found, employing RP techniques can improve deep models: training neural networks on high-dimensional data and initialization of network parameters. Training deep neural networks (DNNs) on sparse, high-dimensional data with no exploitable structure implies a network architecture with an input layer that has a huge number of weights, which often makes training infeasible. We show that this problem can be solved by prepending the network with an input layer whose weights are initialized with an RP matrix. We propose several modifications to the network architecture and training regime that makes it possible to efficiently train DNNs with learnable RP layer on data with as many as tens of millions of input features and training examples. In comparison to the state-of-the-art methods, neural networks with RP layer achieve competitive performance or improve the results on several extremely high-dimensional real-world datasets. The second area where the application of RP techniques can be beneficial for training deep models is weight initialization. Setting the initial weights in DNNs to elements of various RP matrices enabled us to train residual deep networks to higher levels of performance

    Matrix factorization models for cross-domain recommendation : Addressing the cold start in collaborative filtering

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de la Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura :13-01-201

    Deep Learning in Demand Side Management: A Comprehensive Framework for Smart Homes

    Full text link
    The advent of deep learning has elevated machine intelligence to an unprecedented high level. Fundamental concepts, algorithms, and implementations of differentiable programming, including gradient-based measures such as gradient descent and backpropagation, have powered many deep learning algorithms to accomplish millions of tasks in computer vision, signal processing, natural language comprehension, and recommender systems. Demand-side management (DSM) serves as a crucial tactic on the customer side of meters which regulates electricity consumption without hampering the occupant comfort of homeowners. As more residents participate in the energy management program, DSM will further contribute to grid stability protection, economical operation, and carbon emission reduction. However, DSM cannot be implemented effectively without the penetration of smart home technologies that integrate intelligent algorithms into hardware. Resident behaviors being analyzed and comprehended by deep learning algorithms based on sensor-collected human activities data is one typical example of such technology integration. This thesis applies deep learning to DSM and provides a comprehensive framework for smart home management. Firstly, a detailed literature review is conducted on DSM, smart homes, and deep learning. Secondly, the four papers published during the candidate’s Ph.D. career are utilized in lieu of thesis chapters: “A Demand-Side Load Event Detection Algorithm Based on Wide-Deep Neural Networks and Randomized Sparse Backpropagation,” “A Novel High-Performance Deep Learning Framework for Load Recognition: Deep-Shallow Model Based on Fast Backpropagation,” “An Object Surveillance Algorithm Based on Batch-Normalized CNN and Data Augmentation in Smart Home,” “Integrated optimization algorithm: A metaheuristic approach for complicated optimization.” Thirdly, a discussion section is offered to synthesize ideas and key results of the four papers published. Conclusion and directions for future research are provided in the final section of this thesis

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF
    corecore