3,956 research outputs found

    D2^2: Decentralized Training over Decentralized Data

    Full text link
    While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are {\em not too different}. In this paper, we ask the question: {\em Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers?} In this paper, we present D2^2, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance \xr{among workers} (imprecisely, "decentralized" data). The core of D2^2 is a variance blackuction extension of the standard D-PSGD algorithm, which improves the convergence rate from O(ΟƒnT+(nΞΆ2)13T2/3)O\left({\sigma \over \sqrt{nT}} + {(n\zeta^2)^{\frac{1}{3}} \over T^{2/3}}\right) to O(ΟƒnT)O\left({\sigma \over \sqrt{nT}}\right) where ΞΆ2\zeta^{2} denotes the variance among data on different workers. As a result, D2^2 is robust to data variance among workers. We empirically evaluated D2^2 on image classification tasks where each worker has access to only the data of a limited set of labels, and find that D2^2 significantly outperforms D-PSGD

    Distributed Learning over Unreliable Networks

    Full text link
    Most of today's distributed machine learning systems assume {\em reliable networks}: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent work exhibits the impressive tolerance of machine learning algorithms to errors or noise arising from relaxed communication or synchronization. In this paper, we connect these two trends, and consider the following question: {\em Can we design machine learning systems that are tolerant to network unreliability during training?} With this motivation, we focus on a theoretical problem of independent interest---given a standard distributed parameter server architecture, if every communication between the worker and the server has a non-zero probability pp of being dropped, does there exist an algorithm that still converges, and at what speed? The technical contribution of this paper is a novel theoretical analysis proving that distributed learning over unreliable network can achieve comparable convergence rate to centralized or distributed learning over reliable networks. Further, we prove that the influence of the packet drop rate diminishes with the growth of the number of \textcolor{black}{parameter servers}. We map this theoretical result onto a real-world scenario, training deep neural networks over an unreliable network layer, and conduct network simulation to validate the system improvement by allowing the networks to be unreliable

    Zc(3900)Z_c(3900) as a DDΛ‰βˆ—D\bar{D}^* molecule from the pole counting rule

    Full text link
    A comprehensive study on the nature of the Zc(3900)Z_c(3900) resonant structure is carried out in this work. By constructing the pertinent effective Lagrangians and considering the important final-state-interaction effects, we first give a unified description to all the relevant experimental data available, including the J/ΟˆΟ€J/\psi\pi and ππ\pi\pi invariant mass distributions from the e+eβˆ’β†’J/ΟˆΟ€Ο€e^+e^-\to J/\psi\pi\pi process, the hcΟ€h_c\pi distribution from e+eβˆ’β†’hcππe^+e^-\to h_c\pi\pi and also the DDΛ‰βˆ—D\bar D^{*} spectrum in the e+eβˆ’β†’DDΛ‰βˆ—Ο€e^+e^-\to D\bar D^{*}\pi process. After fitting the unknown parameters to the previous data, we search the pole in the complex energy plane and find only one pole in the nearby energy region in different Riemann sheets. Therefore we conclude that Zc(3900)Z_c(3900) is of DDΛ‰βˆ—D\bar D^* molecular nature, according to the pole counting rule method~[Nucl.~Phys.~A543, 632 (1992); Phys.~Rev.~D 35,~1633 (1987)]. We emphasize that the conclusion based upon the pole counting method is not trivial, since both the DDΛ‰βˆ—D\bar D^{*} contact interactions and the explicit ZcZ_c exchanges are introduced in our analyses and they lead to the same conclusion.Comment: 21 pages, 9 figures. To match the published version in PRD. Additional discussion on the spectral density function is include

    Cross-Modal Concept Learning and Inference for Vision-Language Models

    Full text link
    Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, establish the correlation between texts and images, achieving remarkable success on various downstream tasks with fine-tuning. In existing fine-tuning methods, the class-specific text description is matched against the whole image. We recognize that this whole image matching is not effective since images from the same class often contain a set of different semantic objects, and an object further consists of a set of semantic parts or concepts. Individual semantic parts or concepts may appear in image samples from different classes. To address this issue, in this paper, we develop a new method called cross-model concept learning and inference (CCLI). Using the powerful text-image correlation capability of CLIP, our method automatically learns a large set of distinctive visual concepts from images using a set of semantic text concepts. Based on these visual concepts, we construct a discriminative representation of images and learn a concept inference network to perform downstream image classification tasks, such as few-shot learning and domain generalization. Extensive experimental results demonstrate that our CCLI method is able to improve the performance upon the current state-of-the-art methods by large margins, for example, by up to 8.0% improvement on few-shot learning and by up to 1.3% for domain generalization

    Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation

    Full text link
    A central challenge in human pose estimation, as well as in many other machine learning and prediction tasks, is the generalization problem. The learned network does not have the capability to characterize the prediction error, generate feedback information from the test sample, and correct the prediction error on the fly for each individual test sample, which results in degraded performance in generalization. In this work, we introduce a self-correctable and adaptable inference (SCAI) method to address the generalization challenge of network prediction and use human pose estimation as an example to demonstrate its effectiveness and performance. We learn a correction network to correct the prediction result conditioned by a fitness feedback error. This feedback error is generated by a learned fitness feedback network which maps the prediction result to the original input domain and compares it against the original input. Interestingly, we find that this self-referential feedback error is highly correlated with the actual prediction error. This strong correlation suggests that we can use this error as feedback to guide the correction process. It can be also used as a loss function to quickly adapt and optimize the correction network during the inference process. Our extensive experimental results on human pose estimation demonstrate that the proposed SCAI method is able to significantly improve the generalization capability and performance of human pose estimation.Comment: Accepted by CVPR 202

    BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

    Full text link
    Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and ALIGN, have introduced a new paradigm for learning transferable visual representations. Recently, there has been a surge of interest among researchers in developing lightweight fine-tuning techniques to adapt these models to downstream visual tasks. We recognize that current state-of-the-art fine-tuning methods, such as Tip-Adapter, simply consider the covariance between the query image feature and features of support few-shot training samples, which only captures linear relations and potentially instigates a deceptive perception of independence. To address this issue, in this work, we innovatively introduce Brownian Distance Covariance (BDC) to the field of vision-language reasoning. The BDC metric can model all possible relations, providing a robust metric for measuring feature dependence. Based on this, we present a novel method called BDC-Adapter, which integrates BDC prototype similarity reasoning and multi-modal reasoning network prediction to perform classification tasks. Our extensive experimental results show that the proposed BDC-Adapter can freely handle non-linear relations and fully characterize independence, outperforming the current state-of-the-art methods by large margins.Comment: Accepted by BMVC 202
    • …
    corecore