189,388 research outputs found
Consistency of random-walk based network embedding algorithms
Random-walk based network embedding algorithms like node2vec and DeepWalk are
widely used to obtain Euclidean representation of the nodes in a network prior
to performing down-stream network inference tasks. Nevertheless, despite their
impressive empirical performance, there is a lack of theoretical results
explaining their behavior. In this paper we studied the node2vec and DeepWalk
algorithms through the perspective of matrix factorization. We analyze these
algorithms in the setting of community detection for stochastic blockmodel
graphs; in particular we established large-sample error bounds and prove
consistent community recovery of node2vec/DeepWalk embedding followed by
k-means clustering. Our theoretical results indicate a subtle interplay between
the sparsity of the observed networks, the window sizes of the random walks,
and the convergence rates of the node2vec/DeepWalk embedding toward the
embedding of the true but unknown edge probabilities matrix. More specifically,
as the network becomes sparser, our results suggest using larger window sizes,
or equivalently, taking longer random walks, in order to attain better
convergence rate for the resulting embeddings. The paper includes numerical
experiments corroborating these observations
Scalable large margin pairwise learning algorithms
2019 Summer.Includes bibliographical references.Classification is a major task in machine learning and data mining applications. Many of these applications involve building a classification model using a large volume of imbalanced data. In such an imbalanced learning scenario, the area under the ROC curve (AUC) has proven to be a reliable performance measure to evaluate a classifier. Therefore, it is desirable to develop scalable learning algorithms that maximize the AUC metric directly. The kernelized AUC maximization machines have established a superior generalization ability compared to linear AUC machines. However, the computational cost of the kernelized machines hinders their scalability. To address this problem, we propose a large-scale nonlinear AUC maximization algorithm that learns a batch linear classifier on approximate feature space computed via the k-means Nyström method. The proposed algorithm is shown empirically to achieve comparable AUC classification performance or even better than the kernel AUC machines, while its training time is faster by several orders of magnitude. However, the computational complexity of the linear batch model compromises its scalability when training sizable datasets. Hence, we develop a second-order online AUC maximization algorithms based on a confidence-weighted model. The proposed algorithms exploit the second-order information to improve the convergence rate and implement a fixed-size buffer to address the multivariate nature of the AUC objective function. We also extend our online linear algorithms to consider an approximate feature map constructed using random Fourier features in an online setting. The results show that our proposed algorithms outperform or are at least comparable to the competing online AUC maximization methods. Despite their scalability, we notice that online first and second-order AUC maximization methods are prone to suboptimal convergence. This can be attributed to the limitation of the hypothesis space. A potential improvement can be attained by learning stochastic online variants. However, the vanilla stochastic methods also suffer from slow convergence because of the high variance introduced by the stochastic process. We address the problem of slow convergence by developing a fast convergence stochastic AUC maximization algorithm. The proposed stochastic algorithm is accelerated using a unique combination of scheduled regularization update and scheduled averaging. The experimental results show that the proposed algorithm performs better than the state-of-the-art online and stochastic AUC maximization methods in terms of AUC classification accuracy. Moreover, we develop a proximal variant of our accelerated stochastic AUC maximization algorithm. The proposed method applies the proximal operator to the hinge loss function. Therefore, it evaluates the gradient of the loss function at the approximated weight vector. Experiments on several benchmark datasets show that our proximal algorithm converges to the optimal solution faster than the previous AUC maximization algorithms
Delay-Aware Hierarchical Federated Learning
Federated learning has gained popularity as a means of training models
distributed across the wireless edge. The paper introduces delay-aware
hierarchical federated learning (DFL) to improve the efficiency of distributed
machine learning (ML) model training by accounting for communication delays
between edge and cloud. Different from traditional federated learning, DFL
leverages multiple stochastic gradient descent iterations on local datasets
within each global aggregation period and intermittently aggregates model
parameters through edge servers in local subnetworks. During global
synchronization, the cloud server consolidates local models with the outdated
global model using a local-global combiner, thus preserving crucial elements of
both, enhancing learning efficiency under the presence of delay. A set of
conditions is obtained to achieve the sub-linear convergence rate of O(1/k) for
strongly convex and smooth loss functions. Based on these findings, an adaptive
control algorithm is developed for DFL, implementing policies to mitigate
energy consumption and communication latency while aiming for sublinear
convergence. Numerical evaluations show DFL's superior performance in terms of
faster global model convergence, reduced resource consumption, and robustness
against communication delays compared to existing FL algorithms. In summary,
this proposed method offers improved efficiency and results when dealing with
both convex and non-convex loss functions.Comment: A condensed version of this paper was presented at IEEE Globecom 202
Theoretical analysis of a Stochastic Approximation approach for computing Quasi-Stationary distributions
This paper studies a method, which has been proposed in the Physics
literature by [8, 7, 10], for estimating the quasi-stationary distribution. In
contrast to existing methods in eigenvector estimation, the method eliminates
the need for explicit transition matrix manipulation to extract the principal
eigenvector. Our paper analyzes the algorithm by casting it as a stochastic
approximation algorithm (Robbins-Monro) [23, 16]. In doing so, we prove its
convergence and obtain its rate of convergence. Based on this insight, we also
give an example where the rate of convergence is very slow. This problem can be
alleviated by using an improved version of the algorithm that is given in this
paper. Numerical experiments are described that demonstrate the effectiveness
of this improved method
Stochastic optimization methods for the simultaneous control of parameter-dependent systems
We address the application of stochastic optimization methods for the
simultaneous control of parameter-dependent systems. In particular, we focus on
the classical Stochastic Gradient Descent (SGD) approach of Robbins and Monro,
and on the recently developed Continuous Stochastic Gradient (CSG) algorithm.
We consider the problem of computing simultaneous controls through the
minimization of a cost functional defined as the superposition of individual
costs for each realization of the system. We compare the performances of these
stochastic approaches, in terms of their computational complexity, with those
of the more classical Gradient Descent (GD) and Conjugate Gradient (CG)
algorithms, and we discuss the advantages and disadvantages of each
methodology. In agreement with well-established results in the machine learning
context, we show how the SGD and CSG algorithms can significantly reduce the
computational burden when treating control problems depending on a large amount
of parameters. This is corroborated by numerical experiments
- …