3,940 research outputs found
D: Decentralized Training over Decentralized Data
While training a machine learning model using multiple workers, each of which
collects data from their own data sources, it would be most useful when the
data collected from different workers can be {\em unique} and {\em different}.
Ironically, recent analysis of decentralized parallel stochastic gradient
descent (D-PSGD) relies on the assumption that the data hosted on different
workers are {\em not too different}. In this paper, we ask the question: {\em
Can we design a decentralized parallel stochastic gradient descent algorithm
that is less sensitive to the data variance across workers?} In this paper, we
present D, a novel decentralized parallel stochastic gradient descent
algorithm designed for large data variance \xr{among workers} (imprecisely,
"decentralized" data). The core of D is a variance blackuction extension of
the standard D-PSGD algorithm, which improves the convergence rate from
to where
denotes the variance among data on different workers. As a result, D is
robust to data variance among workers. We empirically evaluated D on image
classification tasks where each worker has access to only the data of a limited
set of labels, and find that D significantly outperforms D-PSGD
Distributed Learning over Unreliable Networks
Most of today's distributed machine learning systems assume {\em reliable
networks}: whenever two machines exchange information (e.g., gradients or
models), the network should guarantee the delivery of the message. At the same
time, recent work exhibits the impressive tolerance of machine learning
algorithms to errors or noise arising from relaxed communication or
synchronization. In this paper, we connect these two trends, and consider the
following question: {\em Can we design machine learning systems that are
tolerant to network unreliability during training?} With this motivation, we
focus on a theoretical problem of independent interest---given a standard
distributed parameter server architecture, if every communication between the
worker and the server has a non-zero probability of being dropped, does
there exist an algorithm that still converges, and at what speed? The technical
contribution of this paper is a novel theoretical analysis proving that
distributed learning over unreliable network can achieve comparable convergence
rate to centralized or distributed learning over reliable networks. Further, we
prove that the influence of the packet drop rate diminishes with the growth of
the number of \textcolor{black}{parameter servers}. We map this theoretical
result onto a real-world scenario, training deep neural networks over an
unreliable network layer, and conduct network simulation to validate the system
improvement by allowing the networks to be unreliable
as a molecule from the pole counting rule
A comprehensive study on the nature of the resonant structure is
carried out in this work. By constructing the pertinent effective Lagrangians
and considering the important final-state-interaction effects, we first give a
unified description to all the relevant experimental data available, including
the and invariant mass distributions from the process, the distribution from and
also the spectrum in the process.
After fitting the unknown parameters to the previous data, we search the pole
in the complex energy plane and find only one pole in the nearby energy region
in different Riemann sheets. Therefore we conclude that is of
molecular nature, according to the pole counting rule
method~[Nucl.~Phys.~A543, 632 (1992); Phys.~Rev.~D 35,~1633 (1987)]. We
emphasize that the conclusion based upon the pole counting method is not
trivial, since both the contact interactions and the explicit
exchanges are introduced in our analyses and they lead to the same
conclusion.Comment: 21 pages, 9 figures. To match the published version in PRD.
Additional discussion on the spectral density function is include
Cross-Modal Concept Learning and Inference for Vision-Language Models
Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP,
establish the correlation between texts and images, achieving remarkable
success on various downstream tasks with fine-tuning. In existing fine-tuning
methods, the class-specific text description is matched against the whole
image. We recognize that this whole image matching is not effective since
images from the same class often contain a set of different semantic objects,
and an object further consists of a set of semantic parts or concepts.
Individual semantic parts or concepts may appear in image samples from
different classes. To address this issue, in this paper, we develop a new
method called cross-model concept learning and inference (CCLI). Using the
powerful text-image correlation capability of CLIP, our method automatically
learns a large set of distinctive visual concepts from images using a set of
semantic text concepts. Based on these visual concepts, we construct a
discriminative representation of images and learn a concept inference network
to perform downstream image classification tasks, such as few-shot learning and
domain generalization. Extensive experimental results demonstrate that our CCLI
method is able to improve the performance upon the current state-of-the-art
methods by large margins, for example, by up to 8.0% improvement on few-shot
learning and by up to 1.3% for domain generalization
Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
A central challenge in human pose estimation, as well as in many other
machine learning and prediction tasks, is the generalization problem. The
learned network does not have the capability to characterize the prediction
error, generate feedback information from the test sample, and correct the
prediction error on the fly for each individual test sample, which results in
degraded performance in generalization. In this work, we introduce a
self-correctable and adaptable inference (SCAI) method to address the
generalization challenge of network prediction and use human pose estimation as
an example to demonstrate its effectiveness and performance. We learn a
correction network to correct the prediction result conditioned by a fitness
feedback error. This feedback error is generated by a learned fitness feedback
network which maps the prediction result to the original input domain and
compares it against the original input. Interestingly, we find that this
self-referential feedback error is highly correlated with the actual prediction
error. This strong correlation suggests that we can use this error as feedback
to guide the correction process. It can be also used as a loss function to
quickly adapt and optimize the correction network during the inference process.
Our extensive experimental results on human pose estimation demonstrate that
the proposed SCAI method is able to significantly improve the generalization
capability and performance of human pose estimation.Comment: Accepted by CVPR 202
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning
Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and
ALIGN, have introduced a new paradigm for learning transferable visual
representations. Recently, there has been a surge of interest among researchers
in developing lightweight fine-tuning techniques to adapt these models to
downstream visual tasks. We recognize that current state-of-the-art fine-tuning
methods, such as Tip-Adapter, simply consider the covariance between the query
image feature and features of support few-shot training samples, which only
captures linear relations and potentially instigates a deceptive perception of
independence. To address this issue, in this work, we innovatively introduce
Brownian Distance Covariance (BDC) to the field of vision-language reasoning.
The BDC metric can model all possible relations, providing a robust metric for
measuring feature dependence. Based on this, we present a novel method called
BDC-Adapter, which integrates BDC prototype similarity reasoning and
multi-modal reasoning network prediction to perform classification tasks. Our
extensive experimental results show that the proposed BDC-Adapter can freely
handle non-linear relations and fully characterize independence, outperforming
the current state-of-the-art methods by large margins.Comment: Accepted by BMVC 202
- β¦