6,128 research outputs found
Online Knowledge Distillation with Diverse Peers
Distillation is an effective knowledge-transfer technique that uses predicted
distributions of a powerful teacher model as soft targets to train a
less-parameterized student model. A pre-trained high capacity teacher, however,
is not always available. Recently proposed online variants use the aggregated
intermediate predictions of multiple student models as targets to train each
student model. Although group-derived targets give a good recipe for
teacher-free distillation, group members are homogenized quickly with simple
aggregation functions, leading to early saturated solutions. In this work, we
propose Online Knowledge Distillation with Diverse peers (OKDDip), which
performs two-level distillation during training with multiple auxiliary peers
and one group leader. In the first-level distillation, each auxiliary peer
holds an individual set of aggregation weights generated with an
attention-based mechanism to derive its own targets from predictions of other
auxiliary peers. Learning from distinct target distributions helps to boost
peer diversity for effectiveness of group-based distillation. The second-level
distillation is performed to transfer the knowledge in the ensemble of
auxiliary peers further to the group leader, i.e., the model used for
inference. Experimental results show that the proposed framework consistently
gives better performance than state-of-the-art approaches without sacrificing
training or inference complexity, demonstrating the effectiveness of the
proposed two-level distillation framework.Comment: Accepted to AAAI-202
Journal Club Revisited: Teaching Evidence-Based Research and Practice to Graduate Students in a Professional Degree Program
A Journal Club can be a learning exercise that allows for the critique and pursuant analytic discussion of empirical studies, and encourages the public health, health administration, or health policy student to better understand how evidence-based research contributes to evidence-based practice. The purpose of this paper is to describe a learning exercise that implements the Journal Club to evaluate strengths and limitations of relevant research studies and their potential influence on evidence-based practice. This learning exercise was developed to increase discipline-specific knowledge and improve analytical thinking to form and communicate a well-researched and reasoned critique about current peer-reviewed research. Specifically, the exercise was designed to: (1) identify the peer-review process and its influence on evidence-based practice; (2) curate primary resources for selected health issues; (3) evaluate a published, peer-reviewed research article for its rigor and limitations with respect to reported methods, findings, and applicability to professional practice; and (4) facilitate a discussion about discipline-specific research in a concise, professional manner. At the conclusion of the exercise, graduate students, who are also working professionals, reflected on the utility of examining how evidence-based research impacts evidence-based practice. The benefits of this applied learning approach for students and the faculty instructor are discussed
Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation
In reinforcement learning, domain randomisation is an increasingly popular
technique for learning more general policies that are robust to domain-shifts
at deployment. However, naively aggregating information from randomised domains
may lead to high variance in gradient estimation and unstable learning process.
To address this issue, we present a peer-to-peer online distillation strategy
for RL termed P2PDRL, where multiple workers are each assigned to a different
environment, and exchange knowledge through mutual regularisation based on
Kullback-Leibler divergence. Our experiments on continuous control tasks show
that P2PDRL enables robust learning across a wider randomisation distribution
than baselines, and more robust generalisation to new environments at testing
Peer Collaborative Learning for Online Knowledge Distillation
Traditional knowledge distillation uses a two-stage training strategy to
transfer knowledge from a high-capacity teacher model to a compact student
model, which relies heavily on the pre-trained teacher. Recent online knowledge
distillation alleviates this limitation by collaborative learning, mutual
learning and online ensembling, following a one-stage end-to-end training
fashion. However, collaborative learning and mutual learning fail to construct
an online high-capacity teacher, whilst online ensembling ignores the
collaboration among branches and its logit summation impedes the further
optimisation of the ensemble teacher. In this work, we propose a novel Peer
Collaborative Learning method for online knowledge distillation, which
integrates online ensembling and network collaboration into a unified
framework. Specifically, given a target network, we construct a multi-branch
network for training, in which each branch is called a peer. We perform random
augmentation multiple times on the inputs to peers and assemble feature
representations outputted from peers with an additional classifier as the peer
ensemble teacher. This helps to transfer knowledge from a high-capacity teacher
to peers, and in turn further optimises the ensemble teacher. Meanwhile, we
employ the temporal mean model of each peer as the peer mean teacher to
collaboratively transfer knowledge among peers, which helps each peer to learn
richer knowledge and facilitates to optimise a more stable model with better
generalisation. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet show
that the proposed method significantly improves the generalisation of various
backbone networks and outperforms the state-of-the-art methods
Teacher-Student Architecture for Knowledge Distillation: A Survey
Although Deep neural networks (DNNs) have shown a strong capacity to solve
large-scale problems in many areas, such DNNs are hard to be deployed in
real-world systems due to their voluminous parameters. To tackle this issue,
Teacher-Student architectures were proposed, where simple student networks with
a few parameters can achieve comparable performance to deep teacher networks
with many parameters. Recently, Teacher-Student architectures have been
effectively and widely embraced on various knowledge distillation (KD)
objectives, including knowledge compression, knowledge expansion, knowledge
adaptation, and knowledge enhancement. With the help of Teacher-Student
architectures, current studies are able to achieve multiple distillation
objectives through lightweight and generalized student networks. Different from
existing KD surveys that primarily focus on knowledge compression, this survey
first explores Teacher-Student architectures across multiple distillation
objectives. This survey presents an introduction to various knowledge
representations and their corresponding optimization objectives. Additionally,
we provide a systematic overview of Teacher-Student architectures with
representative learning algorithms and effective distillation schemes. This
survey also summarizes recent applications of Teacher-Student architectures
across multiple purposes, including classification, recognition, generation,
ranking, and regression. Lastly, potential research directions in KD are
investigated, focusing on architecture design, knowledge quality, and
theoretical studies of regression-based learning, respectively. Through this
comprehensive survey, industry practitioners and the academic community can
gain valuable insights and guidelines for effectively designing, learning, and
applying Teacher-Student architectures on various distillation objectives.Comment: 20 pages. arXiv admin note: substantial text overlap with
arXiv:2210.1733
Urbanheart surgery - a logic of design alternatives
In 1972 Sir Leslie Martin in his essay “The Grid as Generator”, advocated “a strong theoretical basis for [planning and] urban design” (Carolin P, 2000, p4) by methodically shifting design parameters regarding the way “in which buildings [could be] placed on the land” Martin was able to demonstrate how the generation of alternatives could “allow wider scope for decisions and objectives” to be considered and discussed (Carmona M, & Tiesdell S 2007, p81). Operating within a conventional design studio yet drawing of Sir Leslie Martin’s logic, ie developing an informed understanding of a problem by identifying a finite world of design ‘alternatives’, the following paper outlines a studio based program at the School of Architecture and Building, Deakin University, referred to as the ‘UrbanHeart Surgery’. While most atelier-based courses operate largely on an ad-hoc basis where students often work within self imposed competitive isolation, Urbanheart adopts a more open yet structured approach where students work in design collaboratives to generate a matrix of alternative design scenarios. The program actively integrates postgraduate students from Architecture, Urban Design and Planning into a design research culture and allows them to engage in critical discourse by working on strategic design projects in three areas significant to the future development of the state of Victoria: Metropolitan Urbanism, Urbanism on the Periphery and Regional Urbanism
- …