31 research outputs found
Adaptive Latent Factor Analysis via Generalized Momentum-Incorporated Particle Swarm Optimization
Stochastic gradient descent (SGD) algorithm is an effective learning strategy
to build a latent factor analysis (LFA) model on a high-dimensional and
incomplete (HDI) matrix. A particle swarm optimization (PSO) algorithm is
commonly adopted to make an SGD-based LFA model's hyper-parameters, i.e,
learning rate and regularization coefficient, self-adaptation. However, a
standard PSO algorithm may suffer from accuracy loss caused by premature
convergence. To address this issue, this paper incorporates more historical
information into each particle's evolutionary process for avoiding premature
convergence following the principle of a generalized-momentum (GM) method,
thereby innovatively achieving a novel GM-incorporated PSO (GM-PSO). With it, a
GM-PSO-based LFA (GMPL) model is further achieved to implement efficient
self-adaptation of hyper-parameters. The experimental results on three HDI
matrices demonstrate that the GMPL model achieves a higher prediction accuracy
for missing data estimation in industrial applications
An Incomplete Tensor Tucker decomposition based Traffic Speed Prediction Method
In intelligent transport systems, it is common and inevitable with missing
data. While complete and valid traffic speed data is of great importance to
intelligent transportation systems. A latent factorization-of-tensors (LFT)
model is one of the most attractive approaches to solve missing traffic data
recovery due to its well-scalability. A LFT model achieves optimization usually
via a stochastic gradient descent (SGD) solver, however, the SGD-based LFT
suffers from slow convergence. To deal with this issue, this work integrates
the unique advantages of the proportional-integral-derivative (PID) controller
into a Tucker decomposition based LFT model. It adopts two-fold ideas: a)
adopting tucker decomposition to build a LFT model for achieving a better
recovery accuracy. b) taking the adjusted instance error based on the PID
control theory into the SGD solver to effectively improve convergence rate. Our
experimental studies on two major city traffic road speed datasets show that
the proposed model achieves significant efficiency gain and highly competitive
prediction accuracy
A Taylor polynomial expansion line search for large-scale optimization
In trying to cope with the Big Data deluge, the landscape of distributed computing has changed. Large commodity hardware clusters, typically operating in some form of MapReduce framework, are becoming prevalent for organizations that require both tremendous storage capacity and fault tolerance. However, the high cost of communication can dominate the computation time in large-scale optimization routines in these frameworks. This thesis considers the problem of how to efficiently conduct univariate line searches in commodity clusters in the context of gradient-based batch optimization algorithms, like the staple limited-memory BFGS (LBFGS) method. In it, a new line search technique is proposed for cases where the underlying objective function is analytic, as in logistic regression and low rank matrix factorization. The technique approximates the objective function by a truncated Taylor polynomial along a fixed search direction. The coefficients of this polynomial may be computed efficiently in parallel with far less communication than needed to transmit the high-dimensional gradient vector, after which the polynomial may be minimized with high accuracy in a neighbourhood of the expansion point without distributed operations. This Polynomial Expansion Line Search (PELS) may be invoked iteratively until the expansion point and minimum are sufficiently accurate, and can provide substantial savings in time and communication costs when multiple iterations in the line search procedure are required.
Three applications of the PELS technique are presented herein for important classes of analytic functions: (i) logistic regression (LR), (ii) low-rank matrix factorization (MF) models, and (iii) the feedforward multilayer perceptron (MLP). In addition, for LR and MF, implementations of PELS in the Apache Spark framework for fault-tolerant cluster computing are provided. These implementations conferred significant convergence enhancements to their respective algorithms, and will be of interest to Spark and Hadoop practitioners. For instance, the Spark PELS technique reduced the number of iterations and time required by LBFGS to reach terminal training accuracies for LR models by factors of 1.8--2. Substantial acceleration was also observed for the Nonlinear Conjugate Gradient algorithm for MLP models, which is an interesting case for future study in optimization for neural networks. The PELS technique is applicable to a broad class of models for Big Data processing and large-scale optimization, and can be a useful component of batch optimization routines
Random Projection in Deep Neural Networks
This work investigates the ways in which deep learning methods can benefit
from random projection (RP), a classic linear dimensionality reduction method.
We focus on two areas where, as we have found, employing RP techniques can
improve deep models: training neural networks on high-dimensional data and
initialization of network parameters. Training deep neural networks (DNNs) on
sparse, high-dimensional data with no exploitable structure implies a network
architecture with an input layer that has a huge number of weights, which often
makes training infeasible. We show that this problem can be solved by
prepending the network with an input layer whose weights are initialized with
an RP matrix. We propose several modifications to the network architecture and
training regime that makes it possible to efficiently train DNNs with learnable
RP layer on data with as many as tens of millions of input features and
training examples. In comparison to the state-of-the-art methods, neural
networks with RP layer achieve competitive performance or improve the results
on several extremely high-dimensional real-world datasets. The second area
where the application of RP techniques can be beneficial for training deep
models is weight initialization. Setting the initial weights in DNNs to
elements of various RP matrices enabled us to train residual deep networks to
higher levels of performance
Matrix factorization models for cross-domain recommendation : Addressing the cold start in collaborative filtering
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de la Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura :13-01-201
Deep Learning in Demand Side Management: A Comprehensive Framework for Smart Homes
The advent of deep learning has elevated machine intelligence to an unprecedented high level. Fundamental concepts, algorithms, and implementations of differentiable programming, including gradient-based measures such as gradient descent and backpropagation, have powered many deep learning algorithms to accomplish millions of tasks in computer vision, signal processing, natural language comprehension, and recommender systems. Demand-side management (DSM) serves as a crucial tactic on the customer side of meters which regulates electricity consumption without hampering the occupant comfort of homeowners. As more residents participate in the energy management program, DSM will further contribute to grid stability protection, economical operation, and carbon emission reduction. However, DSM cannot be implemented effectively without the penetration of smart home technologies that integrate intelligent algorithms into hardware. Resident behaviors being analyzed and comprehended by deep learning algorithms based on sensor-collected human activities data is one typical example of such technology integration. This thesis applies deep learning to DSM and provides a comprehensive framework for smart home management. Firstly, a detailed literature review is conducted on DSM, smart homes, and deep learning. Secondly, the four papers published during the candidate’s Ph.D. career are utilized in lieu of thesis chapters: “A Demand-Side Load Event Detection Algorithm Based on Wide-Deep Neural Networks and Randomized Sparse Backpropagation,” “A Novel High-Performance Deep Learning Framework for Load Recognition: Deep-Shallow Model Based on Fast Backpropagation,” “An Object Surveillance Algorithm Based on Batch-Normalized CNN and Data Augmentation in Smart Home,” “Integrated optimization algorithm: A metaheuristic approach for complicated optimization.” Thirdly, a discussion section is offered to synthesize ideas and key results of the four papers published. Conclusion and directions for future research are provided in the final section of this thesis