Search CORE

4,898 research outputs found

D $^2$ : Decentralized Training over Decentralized Data

Author: Lian Xiangru
Liu Ji
Tang Hanlin
Yan Ming
Zhang Ce
Publication venue
Publication date: 01/01/2018
Field of study

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are {\em not too different}. In this paper, we ask the question: {\em Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers?} In this paper, we present D

^2

, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance \xr{among workers} (imprecisely, "decentralized" data). The core of D

^2

is a variance blackuction extension of the standard D-PSGD algorithm, which improves the convergence rate from

O\left({\sigma \over \sqrt{nT}} + {(n\zeta^2)^{\frac{1}{3}} \over T^{2/3}}\right)

O\left({\sigma \over \sqrt{nT}}\right)

where

\zeta^{2}

denotes the variance among data on different workers. As a result, D

^2

is robust to data variance among workers. We empirically evaluated D

^2

on image classification tasks where each worker has access to only the data of a limited set of labels, and find that D

^2

significantly outperforms D-PSGD

arXiv.org e-Print Archive

Repository for Publications and Research Data

Kac-Schwarz Operators of Type $B$ , Quantum Spectral Curves, and Spin Hurwitz Numbers

Author: Ji Ce
Wang Zhiyuan
Yang Chenglang
Publication venue
Publication date: 06/04/2023
Field of study

Given a tau-function

\tau(t)

of the BKP hierarchy satisfying

\tau(0)=1

, we discuss the relation between its BKP-affine coordinates on the isotropic Sato Grassmannian and its BKP-wave function. Using this result, we formulate a type of Kac-Schwarz operators for

\tau(t)

in terms of BKP-affine coordinates. As an example, we compute the affine coordinates of the BKP tau-function for spin single Hurwitz numbers with completed cycles, and find a pair of Kac-Schwarz operators

(P,Q)

satisfying

[P,Q]=1

. By doing this, we obtain the quantum spectral curve for spin single Hurwitz numbers

arXiv.org e-Print Archive

Distributed Learning over Unreliable Networks

Author: Alistarh Dan
Kassing Simon
Liu Ji
Renggli Cedric
Singla Ankit
Tang Hanlin
Yu Chen
Zhang Ce
Publication venue
Publication date: 01/01/2019
Field of study

Most of today's distributed machine learning systems assume {\em reliable networks}: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent work exhibits the impressive tolerance of machine learning algorithms to errors or noise arising from relaxed communication or synchronization. In this paper, we connect these two trends, and consider the following question: {\em Can we design machine learning systems that are tolerant to network unreliability during training?} With this motivation, we focus on a theoretical problem of independent interest---given a standard distributed parameter server architecture, if every communication between the worker and the server has a non-zero probability

p

of being dropped, does there exist an algorithm that still converges, and at what speed? The technical contribution of this paper is a novel theoretical analysis proving that distributed learning over unreliable network can achieve comparable convergence rate to centralized or distributed learning over reliable networks. Further, we prove that the influence of the packet drop rate diminishes with the growth of the number of \textcolor{black}{parameter servers}. We map this theoretical result onto a real-world scenario, training deep neural networks over an unreliable network layer, and conduct network simulation to validate the system improvement by allowing the networks to be unreliable

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

Multi-view transition HMMs based view-invariant human action recognition method

Author: Ji Xiaofei
Ju Zhaojie
Wang Ce
Wang Changhui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/05/2015
Field of study

Portsmouth University Research Portal (Pure)

PMLR Press

Author: Alistarh Dan-Adrian
Kara Kaan
Li Jerry
Liu Ji
Zhang Ce
Zhang Hantian
Publication venue: PMLR
Publication date: 01/01/2017
Field of study

Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We mainly focus on linear models, and the answer is yes for linear models. We develop a simple framework called ZipML based on one simple but novel strategy called double sampling. Our ZipML framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quanti- zation would introduce significant bias. We val- idate our framework across a range of applica- tions, and show that it enables an FPGA proto- type that is up to 6.5 × faster than an implemen- tation using full 32-bit precision. We further de- velop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another 1.7 × in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the- art XNOR-Net

IST Austria: PubRep (Institute of Science and Technology)