1,229 research outputs found

    Practical Insights into Knowledge Distillation for Pre-Trained Models

    Full text link
    This research investigates the enhancement of knowledge distillation (KD) processes in pre-trained models, an emerging field in knowledge transfer with significant implications for distributed training and federated learning environments. These environments benefit from reduced communication demands and accommodate various model architectures. Despite the adoption of numerous KD approaches for transferring knowledge among pre-trained models, a comprehensive understanding of KD's application in these scenarios is lacking. Our study conducts an extensive comparison of multiple KD techniques, including standard KD, tuned KD (via optimized temperature and weight parameters), deep mutual learning, and data partitioning KD. We assess these methods across various data distribution strategies to identify the most effective contexts for each. Through detailed examination of hyperparameter tuning, informed by extensive grid search evaluations, we pinpoint when adjustments are crucial to enhance model performance. This paper sheds light on optimal hyperparameter settings for distinct data partitioning scenarios and investigates KD's role in improving federated learning by minimizing communication rounds and expediting the training process. By filling a notable void in current research, our findings serve as a practical framework for leveraging KD in pre-trained models within collaborative and federated learning frameworks

    Prelude: Ensuring Inter-Domain Loop-Freedom in~SDN-Enabled Networks

    Full text link
    Software-Defined-eXchanges (SDXes) promise to tackle the timely quest of bringing improving the inter-domain routing ecosystem through SDN deployment. Yet, the naive deployment of SDN on the Internet raises concerns about the correctness of the inter-domain data-plane. By allowing operators to deflect traffic from the default BGP route, SDN policies are susceptible of creating permanent forwarding loops invisible to the control-plane. In this paper, we propose a system, called Prelude, for detecting SDN-induced forwarding loops between SDXes with high accuracy without leaking the private routing information of network operators. To achieve this, we leverage Secure Multi-Party Computation (SMPC) techniques to build a novel and general privacy-preserving primitive that detects whether any subset of SDN rules might affect the same portion of traffic without learning anything about those rules. We then leverage that primitive as the main building block of a distributed system tailored to detect forwarding loops among any set of SDXes. We leverage the particular nature of SDXes to further improve the efficiency of our SMPC solution. The number of valid SDN rules, i.e., not creating loops, rejected by our solution is 100x lower than previous privacy-preserving solutions, and also provides better privacy guarantees. Furthermore, our solution naturally provides network operators with some hindsight on the cost of the deflected paths

    The United Nations and the Human Rights Issue

    Get PDF
    This paper demonstrates a new class of bugs that is likely to occur in enterprise OpenFlow deployments. In particular, step-by-step, reactive establishment of paths can cause network-wide inconsistencies or performance- and space-related inefficiencies. The cause for this behavior is inconsistent packet processing: as the packets travel through the network they do not encounter consistent state at the OpenFlow controller. To mitigate this problem, we propose to use transactional semantics at the controller to achieve consistent packet processing. We detail the challenges in achieving this goal (including the inability to directly apply database techniques), as well as a potentially promising approach. In particular, we envision the use of multi-commit transactions that could provide the necessary serialization and isolation properties without excessively reducing network performance.QC 20140707</p

    Consistent SDNs through Network State Fuzzing

    Full text link
    The conventional wisdom is that a software-defined network (SDN) operates under the premise that the logically centralized control plane has an accurate representation of the actual data plane state. Unfortunately, bugs, misconfigurations, faults or attacks can introduce inconsistencies that undermine correct operation. Previous work in this area, however, lacks a holistic methodology to tackle this problem and thus, addresses only certain parts of the problem. Yet, the consistency of the overall system is only as good as its least consistent part. Motivated by an analogy of network consistency checking with program testing, we propose to add active probe-based network state fuzzing to our consistency check repertoire. Hereby, our system, PAZZ, combines production traffic with active probes to periodically test if the actual forwarding path and decision elements (on the data plane) correspond to the expected ones (on the control plane). Our insight is that active traffic covers the inconsistency cases beyond the ones identified by passive traffic. PAZZ prototype was built and evaluated on topologies of varying scale and complexity. Our results show that PAZZ requires minimal network resources to detect persistent data plane faults through fuzzing and localize them quickly while outperforming baseline approaches.Comment: Added three extra relevant references, the arXiv later was accepted in IEEE Transactions of Network and Service Management (TNSM), 2019 with the title "Towards Consistent SDNs: A Case for Network State Fuzzing

    Natural Compression for Distributed Deep Learning

    Full text link
    Modern deep learning models are often trained in parallel over a collection of distributed machines to reduce training time. In such settings, communication of model updates among machines becomes a significant performance bottleneck and various lossy update compression techniques have been proposed to alleviate this problem. In this work, we introduce a new, simple yet theoretically and practically effective compression technique: {\em natural compression (NC)}. Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two, which can be computed in a "natural" way by ignoring the mantissa. We show that compared to no compression, NC increases the second moment of the compressed vector by not more than the tiny factor \nicefrac{9}{8}, which means that the effect of NC on the convergence speed of popular training algorithms, such as distributed SGD, is negligible. However, the communications savings enabled by NC are substantial, leading to {\em 33-4Ă—4\times improvement in overall theoretical running time}. For applications requiring more aggressive compression, we generalize NC to {\em natural dithering}, which we prove is {\em exponentially better} than the common random dithering technique. Our compression operators can be used on their own or in combination with existing operators for a more aggressive combined effect, and offer new state-of-the-art both in theory and practice.Comment: 8 pages, 20 pages of Appendix, 6 Tables, 14 Figure

    Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

    Full text link
    Efficient distributed training is a principal driver of recent advances in deep learning. However, communication often proves costly and becomes the primary bottleneck in these systems. As a result, there is a demand for the design of efficient communication mechanisms that can empirically boost throughput while providing theoretical guarantees. In this work, we introduce Global-QSGD, a novel family of quantization operators, engineered to accelerate distributed training based on global scaling. We demonstrate that Global-QSGD is the first theoretically rigorous Allreduce-compatible compression mechanism that achieves a provable speed-up by striking a balance between compression error and communication savings. Importantly, Global-QSGD does not rely on costly error feedback due to its inherent unbiasedness and offers up to O(n)O(\sqrt{n}) additional compression ratio compared to the popular QSGD quantization (nn represents the number of workers). To obtain theoretical guarantees, we generalize the notion of standard unbiased compression operators to incorporate Global-QSGD. We show that this wider class permits standard analysis for unbiased compressors and thus ensures convergence for popular optimization algorithms (e.g., distributed SGD) under typical settings. For the empirical component of our work, we carry out a performance modeling analysis to determine if Global-QSGD can enhance training throughput under specific hardware configurations. We also conduct extensive empirical evaluations on various tasks, testing our theory on both NVLink and PCIe connections as well as a large-scale cloud system
    • …
    corecore