Search CORE

3,956 research outputs found

D $^2$ : Decentralized Training over Decentralized Data

Author: Lian Xiangru
Liu Ji
Tang Hanlin
Yan Ming
Zhang Ce
Publication venue
Publication date: 01/01/2018
Field of study

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are {\em not too different}. In this paper, we ask the question: {\em Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers?} In this paper, we present D

^2

, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance \xr{among workers} (imprecisely, "decentralized" data). The core of D

^2

is a variance blackuction extension of the standard D-PSGD algorithm, which improves the convergence rate from

O\left({\sigma \over \sqrt{nT}} + {(n\zeta^2)^{\frac{1}{3}} \over T^{2/3}}\right)

O\left({\sigma \over \sqrt{nT}}\right)

where

\zeta^{2}

denotes the variance among data on different workers. As a result, D

^2

is robust to data variance among workers. We empirically evaluated D

^2

on image classification tasks where each worker has access to only the data of a limited set of labels, and find that D

^2

significantly outperforms D-PSGD

arXiv.org e-Print Archive

Repository for Publications and Research Data

Distributed Learning over Unreliable Networks

Author: Alistarh Dan
Kassing Simon
Liu Ji
Renggli Cedric
Singla Ankit
Tang Hanlin
Yu Chen
Zhang Ce
Publication venue
Publication date: 01/01/2019
Field of study

Most of today's distributed machine learning systems assume {\em reliable networks}: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent work exhibits the impressive tolerance of machine learning algorithms to errors or noise arising from relaxed communication or synchronization. In this paper, we connect these two trends, and consider the following question: {\em Can we design machine learning systems that are tolerant to network unreliability during training?} With this motivation, we focus on a theoretical problem of independent interest---given a standard distributed parameter server architecture, if every communication between the worker and the server has a non-zero probability

p

of being dropped, does there exist an algorithm that still converges, and at what speed? The technical contribution of this paper is a novel theoretical analysis proving that distributed learning over unreliable network can achieve comparable convergence rate to centralized or distributed learning over reliable networks. Further, we prove that the influence of the packet drop rate diminishes with the growth of the number of \textcolor{black}{parameter servers}. We map this theoretical result onto a real-world scenario, training deep neural networks over an unreliable network layer, and conduct network simulation to validate the system improvement by allowing the networks to be unreliable

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

$Z_c(3900)$ as a $D\bar{D}^*$ molecule from the pole counting rule

Author: Gong Qin-Rong
Guo Zhi-Hui
Meng Ce
Tang Guang-Yi
Wang Yu-Fei
Zheng Han-Qing
Publication venue: 'American Physical Society (APS)'
Publication date: 19/12/2016
Field of study

A comprehensive study on the nature of the

Z_c(3900)

resonant structure is carried out in this work. By constructing the pertinent effective Lagrangians and considering the important final-state-interaction effects, we first give a unified description to all the relevant experimental data available, including the

J/\psi\pi

and

\pi\pi

invariant mass distributions from the

e^+e^-\to J/\psi\pi\pi

process, the

h_c\pi

distribution from

e^+e^-\to h_c\pi\pi

and also the

D\bar D^{*}

spectrum in the

e^+e^-\to D\bar D^{*}\pi

process. After fitting the unknown parameters to the previous data, we search the pole in the complex energy plane and find only one pole in the nearby energy region in different Riemann sheets. Therefore we conclude that

Z_c(3900)

is of

D\bar D^*

molecular nature, according to the pole counting rule method~[Nucl.~Phys.~A543, 632 (1992); Phys.~Rev.~D 35,~1633 (1987)]. We emphasize that the conclusion based upon the pole counting method is not trivial, since both the

D\bar D^{*}

contact interactions and the explicit

Z_c

exchanges are introduced in our analyses and they lead to the same conclusion.Comment: 21 pages, 9 figures. To match the published version in PRD. Additional discussion on the spectral density function is include

arXiv.org e-Print Archive

Cross-Modal Concept Learning and Inference for Vision-Language Models

Author: He Zhihai
Tang Yushun
Zhang Ce
Zhang Yi
Publication venue
Publication date: 28/07/2023
Field of study

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, establish the correlation between texts and images, achieving remarkable success on various downstream tasks with fine-tuning. In existing fine-tuning methods, the class-specific text description is matched against the whole image. We recognize that this whole image matching is not effective since images from the same class often contain a set of different semantic objects, and an object further consists of a set of semantic parts or concepts. Individual semantic parts or concepts may appear in image samples from different classes. To address this issue, in this paper, we develop a new method called cross-model concept learning and inference (CCLI). Using the powerful text-image correlation capability of CLIP, our method automatically learns a large set of distinctive visual concepts from images using a set of semantic text concepts. Based on these visual concepts, we construct a discriminative representation of images and learn a concept inference network to perform downstream image classification tasks, such as few-shot learning and domain generalization. Extensive experimental results demonstrate that our CCLI method is able to improve the performance upon the current state-of-the-art methods by large margins, for example, by up to 8.0% improvement on few-shot learning and by up to 1.3% for domain generalization

arXiv.org e-Print Archive

Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation

Author: Chen Shuoshuo
He Zhihai
Kan Zhehan
Tang Yushun
Zhang Ce
Publication venue
Publication date: 25/03/2023
Field of study

A central challenge in human pose estimation, as well as in many other machine learning and prediction tasks, is the generalization problem. The learned network does not have the capability to characterize the prediction error, generate feedback information from the test sample, and correct the prediction error on the fly for each individual test sample, which results in degraded performance in generalization. In this work, we introduce a self-correctable and adaptable inference (SCAI) method to address the generalization challenge of network prediction and use human pose estimation as an example to demonstrate its effectiveness and performance. We learn a correction network to correct the prediction result conditioned by a fitness feedback error. This feedback error is generated by a learned fitness feedback network which maps the prediction result to the original input domain and compares it against the original input. Interestingly, we find that this self-referential feedback error is highly correlated with the actual prediction error. This strong correlation suggests that we can use this error as feedback to guide the correction process. It can be also used as a loss function to quickly adapt and optimize the correction network during the inference process. Our extensive experimental results on human pose estimation demonstrate that the proposed SCAI method is able to significantly improve the generalization capability and performance of human pose estimation.Comment: Accepted by CVPR 202

arXiv.org e-Print Archive

BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

Author: He Zhihai
Liao Zihan
Tang Yushun
Zhang Ce
Zhang Yi
Publication venue
Publication date: 03/09/2023
Field of study

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and ALIGN, have introduced a new paradigm for learning transferable visual representations. Recently, there has been a surge of interest among researchers in developing lightweight fine-tuning techniques to adapt these models to downstream visual tasks. We recognize that current state-of-the-art fine-tuning methods, such as Tip-Adapter, simply consider the covariance between the query image feature and features of support few-shot training samples, which only captures linear relations and potentially instigates a deceptive perception of independence. To address this issue, in this work, we innovatively introduce Brownian Distance Covariance (BDC) to the field of vision-language reasoning. The BDC metric can model all possible relations, providing a robust metric for measuring feature dependence. Based on this, we present a novel method called BDC-Adapter, which integrates BDC prototype similarity reasoning and multi-modal reasoning network prediction to perform classification tasks. Our extensive experimental results show that the proposed BDC-Adapter can freely handle non-linear relations and fully characterize independence, outperforming the current state-of-the-art methods by large margins.Comment: Accepted by BMVC 202

arXiv.org e-Print Archive