1,229 research outputs found
Practical Insights into Knowledge Distillation for Pre-Trained Models
This research investigates the enhancement of knowledge distillation (KD)
processes in pre-trained models, an emerging field in knowledge transfer with
significant implications for distributed training and federated learning
environments. These environments benefit from reduced communication demands and
accommodate various model architectures. Despite the adoption of numerous KD
approaches for transferring knowledge among pre-trained models, a comprehensive
understanding of KD's application in these scenarios is lacking. Our study
conducts an extensive comparison of multiple KD techniques, including standard
KD, tuned KD (via optimized temperature and weight parameters), deep mutual
learning, and data partitioning KD. We assess these methods across various data
distribution strategies to identify the most effective contexts for each.
Through detailed examination of hyperparameter tuning, informed by extensive
grid search evaluations, we pinpoint when adjustments are crucial to enhance
model performance. This paper sheds light on optimal hyperparameter settings
for distinct data partitioning scenarios and investigates KD's role in
improving federated learning by minimizing communication rounds and expediting
the training process. By filling a notable void in current research, our
findings serve as a practical framework for leveraging KD in pre-trained models
within collaborative and federated learning frameworks
Prelude: Ensuring Inter-Domain Loop-Freedom in~SDN-Enabled Networks
Software-Defined-eXchanges (SDXes) promise to tackle the timely quest of
bringing improving the inter-domain routing ecosystem through SDN deployment.
Yet, the naive deployment of SDN on the Internet raises concerns about the
correctness of the inter-domain data-plane. By allowing operators to deflect
traffic from the default BGP route, SDN policies are susceptible of creating
permanent forwarding loops invisible to the control-plane.
In this paper, we propose a system, called Prelude, for detecting SDN-induced
forwarding loops between SDXes with high accuracy without leaking the private
routing information of network operators. To achieve this, we leverage Secure
Multi-Party Computation (SMPC) techniques to build a novel and general
privacy-preserving primitive that detects whether any subset of SDN rules might
affect the same portion of traffic without learning anything about those rules.
We then leverage that primitive as the main building block of a distributed
system tailored to detect forwarding loops among any set of SDXes. We leverage
the particular nature of SDXes to further improve the efficiency of our SMPC
solution.
The number of valid SDN rules, i.e., not creating loops, rejected by our
solution is 100x lower than previous privacy-preserving solutions, and also
provides better privacy guarantees. Furthermore, our solution naturally
provides network operators with some hindsight on the cost of the deflected
paths
The United Nations and the Human Rights Issue
This paper demonstrates a new class of bugs that is likely to occur in enterprise OpenFlow deployments. In particular, step-by-step, reactive establishment of paths can cause network-wide inconsistencies or performance- and space-related inefficiencies. The cause for this behavior is inconsistent packet processing: as the packets travel through the network they do not encounter consistent state at the OpenFlow controller. To mitigate this problem, we propose to use transactional semantics at the controller to achieve consistent packet processing. We detail the challenges in achieving this goal (including the inability to directly apply database techniques), as well as a potentially promising approach. In particular, we envision the use of multi-commit transactions that could provide the necessary serialization and isolation properties without excessively reducing network performance.QC 20140707</p
Consistent SDNs through Network State Fuzzing
The conventional wisdom is that a software-defined network (SDN) operates
under the premise that the logically centralized control plane has an accurate
representation of the actual data plane state. Unfortunately, bugs,
misconfigurations, faults or attacks can introduce inconsistencies that
undermine correct operation. Previous work in this area, however, lacks a
holistic methodology to tackle this problem and thus, addresses only certain
parts of the problem. Yet, the consistency of the overall system is only as
good as its least consistent part. Motivated by an analogy of network
consistency checking with program testing, we propose to add active probe-based
network state fuzzing to our consistency check repertoire. Hereby, our system,
PAZZ, combines production traffic with active probes to periodically test if
the actual forwarding path and decision elements (on the data plane) correspond
to the expected ones (on the control plane). Our insight is that active traffic
covers the inconsistency cases beyond the ones identified by passive traffic.
PAZZ prototype was built and evaluated on topologies of varying scale and
complexity. Our results show that PAZZ requires minimal network resources to
detect persistent data plane faults through fuzzing and localize them quickly
while outperforming baseline approaches.Comment: Added three extra relevant references, the arXiv later was accepted
in IEEE Transactions of Network and Service Management (TNSM), 2019 with the
title "Towards Consistent SDNs: A Case for Network State Fuzzing
Natural Compression for Distributed Deep Learning
Modern deep learning models are often trained in parallel over a collection
of distributed machines to reduce training time. In such settings,
communication of model updates among machines becomes a significant performance
bottleneck and various lossy update compression techniques have been proposed
to alleviate this problem. In this work, we introduce a new, simple yet
theoretically and practically effective compression technique: {\em natural
compression (NC)}. Our technique is applied individually to all entries of the
to-be-compressed update vector and works by randomized rounding to the nearest
(negative or positive) power of two, which can be computed in a "natural" way
by ignoring the mantissa. We show that compared to no compression, NC increases
the second moment of the compressed vector by not more than the tiny factor
\nicefrac{9}{8}, which means that the effect of NC on the convergence speed
of popular training algorithms, such as distributed SGD, is negligible.
However, the communications savings enabled by NC are substantial, leading to
{\em - improvement in overall theoretical running time}. For
applications requiring more aggressive compression, we generalize NC to {\em
natural dithering}, which we prove is {\em exponentially better} than the
common random dithering technique. Our compression operators can be used on
their own or in combination with existing operators for a more aggressive
combined effect, and offer new state-of-the-art both in theory and practice.Comment: 8 pages, 20 pages of Appendix, 6 Tables, 14 Figure
Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees
Efficient distributed training is a principal driver of recent advances in
deep learning. However, communication often proves costly and becomes the
primary bottleneck in these systems. As a result, there is a demand for the
design of efficient communication mechanisms that can empirically boost
throughput while providing theoretical guarantees. In this work, we introduce
Global-QSGD, a novel family of quantization operators, engineered to accelerate
distributed training based on global scaling. We demonstrate that Global-QSGD
is the first theoretically rigorous Allreduce-compatible compression mechanism
that achieves a provable speed-up by striking a balance between compression
error and communication savings. Importantly, Global-QSGD does not rely on
costly error feedback due to its inherent unbiasedness and offers up to
additional compression ratio compared to the popular QSGD
quantization ( represents the number of workers). To obtain theoretical
guarantees, we generalize the notion of standard unbiased compression operators
to incorporate Global-QSGD. We show that this wider class permits standard
analysis for unbiased compressors and thus ensures convergence for popular
optimization algorithms (e.g., distributed SGD) under typical settings. For the
empirical component of our work, we carry out a performance modeling analysis
to determine if Global-QSGD can enhance training throughput under specific
hardware configurations. We also conduct extensive empirical evaluations on
various tasks, testing our theory on both NVLink and PCIe connections as well
as a large-scale cloud system
- …