130 research outputs found
D: Decentralized Training over Decentralized Data
While training a machine learning model using multiple workers, each of which
collects data from their own data sources, it would be most useful when the
data collected from different workers can be {\em unique} and {\em different}.
Ironically, recent analysis of decentralized parallel stochastic gradient
descent (D-PSGD) relies on the assumption that the data hosted on different
workers are {\em not too different}. In this paper, we ask the question: {\em
Can we design a decentralized parallel stochastic gradient descent algorithm
that is less sensitive to the data variance across workers?} In this paper, we
present D, a novel decentralized parallel stochastic gradient descent
algorithm designed for large data variance \xr{among workers} (imprecisely,
"decentralized" data). The core of D is a variance blackuction extension of
the standard D-PSGD algorithm, which improves the convergence rate from
to where
denotes the variance among data on different workers. As a result, D is
robust to data variance among workers. We empirically evaluated D on image
classification tasks where each worker has access to only the data of a limited
set of labels, and find that D significantly outperforms D-PSGD
A role for recurrent processing in object completion: neurophysiological, psychophysical and computational"evidence
Recognition of objects from partial information presents a significant
challenge for theories of vision because it requires spatial integration and
extrapolation from prior knowledge. We combined neurophysiological recordings
in human cortex with psychophysical measurements and computational modeling to
investigate the mechanisms involved in object completion. We recorded
intracranial field potentials from 1,699 electrodes in 18 epilepsy patients to
measure the timing and selectivity of responses along human visual cortex to
whole and partial objects. Responses along the ventral visual stream remained
selective despite showing only 9-25% of the object. However, these visually
selective signals emerged ~100 ms later for partial versus whole objects. The
processing delays were particularly pronounced in higher visual areas within
the ventral stream, suggesting the involvement of additional recurrent
processing. In separate psychophysics experiments, disrupting this recurrent
computation with a backward mask at ~75ms significantly impaired recognition of
partial, but not whole, objects. Additionally, computational modeling shows
that the performance of a purely bottom-up architecture is impaired by heavy
occlusion and that this effect can be partially rescued via the incorporation
of top-down connections. These results provide spatiotemporal constraints on
theories of object recognition that involve recurrent processing to recognize
objects from partial information
FedCut: A Spectral Analysis Framework for Reliable Detection of Byzantine Colluders
This paper proposes a general spectral analysis framework that thwarts a
security risk in federated Learning caused by groups of malicious Byzantine
attackers or colluders, who conspire to upload vicious model updates to
severely debase global model performances. The proposed framework delineates
the strong consistency and temporal coherence between Byzantine colluders'
model updates from a spectral analysis lens, and, formulates the detection of
Byzantine misbehaviours as a community detection problem in weighted graphs.
The modified normalized graph cut is then utilized to discern attackers from
benign participants. Moreover, the Spectral heuristics is adopted to make the
detection robust against various attacks. The proposed Byzantine colluder
resilient method, i.e., FedCut, is guaranteed to converge with bounded errors.
Extensive experimental results under a variety of settings justify the
superiority of FedCut, which demonstrates extremely robust model performance
(MP) under various attacks. It was shown that FedCut's averaged MP is 2.1% to
16.5% better than that of the state of the art Byzantine-resilient methods. In
terms of the worst-case model performance (MP), FedCut is 17.6% to 69.5% better
than these methods
Distributed Learning over Unreliable Networks
Most of today's distributed machine learning systems assume {\em reliable
networks}: whenever two machines exchange information (e.g., gradients or
models), the network should guarantee the delivery of the message. At the same
time, recent work exhibits the impressive tolerance of machine learning
algorithms to errors or noise arising from relaxed communication or
synchronization. In this paper, we connect these two trends, and consider the
following question: {\em Can we design machine learning systems that are
tolerant to network unreliability during training?} With this motivation, we
focus on a theoretical problem of independent interest---given a standard
distributed parameter server architecture, if every communication between the
worker and the server has a non-zero probability of being dropped, does
there exist an algorithm that still converges, and at what speed? The technical
contribution of this paper is a novel theoretical analysis proving that
distributed learning over unreliable network can achieve comparable convergence
rate to centralized or distributed learning over reliable networks. Further, we
prove that the influence of the packet drop rate diminishes with the growth of
the number of \textcolor{black}{parameter servers}. We map this theoretical
result onto a real-world scenario, training deep neural networks over an
unreliable network layer, and conduct network simulation to validate the system
improvement by allowing the networks to be unreliable
SpaceNet MVOI: a Multi-View Overhead Imagery Dataset
Detection and segmentation of objects in overheard imagery is a challenging
task. The variable density, random orientation, small size, and
instance-to-instance heterogeneity of objects in overhead imagery calls for
approaches distinct from existing models designed for natural scene datasets.
Though new overhead imagery datasets are being developed, they almost
universally comprise a single view taken from directly overhead ("at nadir"),
failing to address a critical variable: look angle. By contrast, views vary in
real-world overhead imagery, particularly in dynamic scenarios such as natural
disasters where first looks are often over 40 degrees off-nadir. This
represents an important challenge to computer vision methods, as changing view
angle adds distortions, alters resolution, and changes lighting. At present,
the impact of these perturbations for algorithmic detection and segmentation of
objects is untested. To address this problem, we present an open source
Multi-View Overhead Imagery dataset, termed SpaceNet MVOI, with 27 unique looks
from a broad range of viewing angles (-32.5 degrees to 54.0 degrees). Each of
these images cover the same 665 square km geographic extent and are annotated
with 126,747 building footprint labels, enabling direct assessment of the
impact of viewpoint perturbation on model performance. We benchmark multiple
leading segmentation and object detection models on: (1) building detection,
(2) generalization to unseen viewing angles and resolutions, and (3)
sensitivity of building footprint extraction to changes in resolution. We find
that state of the art segmentation and object detection models struggle to
identify buildings in off-nadir imagery and generalize poorly to unseen views,
presenting an important benchmark to explore the broadly relevant challenge of
detecting small, heterogeneous target objects in visually dynamic contexts.Comment: Accepted into IEEE International Conference on Computer Vision (ICCV)
201
- …