6,128 research outputs found

    Online Knowledge Distillation with Diverse Peers

    Full text link
    Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always available. Recently proposed online variants use the aggregated intermediate predictions of multiple student models as targets to train each student model. Although group-derived targets give a good recipe for teacher-free distillation, group members are homogenized quickly with simple aggregation functions, leading to early saturated solutions. In this work, we propose Online Knowledge Distillation with Diverse peers (OKDDip), which performs two-level distillation during training with multiple auxiliary peers and one group leader. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. Learning from distinct target distributions helps to boost peer diversity for effectiveness of group-based distillation. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.Comment: Accepted to AAAI-202

    Journal Club Revisited: Teaching Evidence-Based Research and Practice to Graduate Students in a Professional Degree Program

    Get PDF
    A Journal Club can be a learning exercise that allows for the critique and pursuant analytic discussion of empirical studies, and encourages the public health, health administration, or health policy student to better understand how evidence-based research contributes to evidence-based practice. The purpose of this paper is to describe a learning exercise that implements the Journal Club to evaluate strengths and limitations of relevant research studies and their potential influence on evidence-based practice. This learning exercise was developed to increase discipline-specific knowledge and improve analytical thinking to form and communicate a well-researched and reasoned critique about current peer-reviewed research. Specifically, the exercise was designed to: (1) identify the peer-review process and its influence on evidence-based practice; (2) curate primary resources for selected health issues; (3) evaluate a published, peer-reviewed research article for its rigor and limitations with respect to reported methods, findings, and applicability to professional practice; and (4) facilitate a discussion about discipline-specific research in a concise, professional manner. At the conclusion of the exercise, graduate students, who are also working professionals, reflected on the utility of examining how evidence-based research impacts evidence-based practice. The benefits of this applied learning approach for students and the faculty instructor are discussed

    Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

    Get PDF
    In reinforcement learning, domain randomisation is an increasingly popular technique for learning more general policies that are robust to domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variance in gradient estimation and unstable learning process. To address this issue, we present a peer-to-peer online distillation strategy for RL termed P2PDRL, where multiple workers are each assigned to a different environment, and exchange knowledge through mutual regularisation based on Kullback-Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation to new environments at testing

    Peer Collaborative Learning for Online Knowledge Distillation

    Get PDF
    Traditional knowledge distillation uses a two-stage training strategy to transfer knowledge from a high-capacity teacher model to a compact student model, which relies heavily on the pre-trained teacher. Recent online knowledge distillation alleviates this limitation by collaborative learning, mutual learning and online ensembling, following a one-stage end-to-end training fashion. However, collaborative learning and mutual learning fail to construct an online high-capacity teacher, whilst online ensembling ignores the collaboration among branches and its logit summation impedes the further optimisation of the ensemble teacher. In this work, we propose a novel Peer Collaborative Learning method for online knowledge distillation, which integrates online ensembling and network collaboration into a unified framework. Specifically, given a target network, we construct a multi-branch network for training, in which each branch is called a peer. We perform random augmentation multiple times on the inputs to peers and assemble feature representations outputted from peers with an additional classifier as the peer ensemble teacher. This helps to transfer knowledge from a high-capacity teacher to peers, and in turn further optimises the ensemble teacher. Meanwhile, we employ the temporal mean model of each peer as the peer mean teacher to collaboratively transfer knowledge among peers, which helps each peer to learn richer knowledge and facilitates to optimise a more stable model with better generalisation. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method significantly improves the generalisation of various backbone networks and outperforms the state-of-the-art methods

    Teacher-Student Architecture for Knowledge Distillation: A Survey

    Full text link
    Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.Comment: 20 pages. arXiv admin note: substantial text overlap with arXiv:2210.1733

    Urbanheart surgery - a logic of design alternatives

    Full text link
    In 1972 Sir Leslie Martin in his essay “The Grid as Generator”, advocated “a strong theoretical basis for [planning and] urban design” (Carolin P, 2000, p4) by methodically shifting design parameters regarding the way “in which buildings [could be] placed on the land” Martin was able to demonstrate how the generation of alternatives could “allow wider scope for decisions and objectives” to be considered and discussed (Carmona M, & Tiesdell S 2007, p81). Operating within a conventional design studio yet drawing of Sir Leslie Martin’s logic, ie developing an informed understanding of a problem by identifying a finite world of design ‘alternatives’, the following paper outlines a studio based program at the School of Architecture and Building, Deakin University, referred to as the ‘UrbanHeart Surgery’. While most atelier-based courses operate largely on an ad-hoc basis where students often work within self imposed competitive isolation, Urbanheart adopts a more open yet structured approach where students work in design collaboratives to generate a matrix of alternative design scenarios. The program actively integrates postgraduate students from Architecture, Urban Design and Planning into a design research culture and allows them to engage in critical discourse by working on strategic design projects in three areas significant to the future development of the state of Victoria: Metropolitan Urbanism, Urbanism on the Periphery and Regional Urbanism
    • …
    corecore