752 research outputs found
Breaking the Architecture Barrier: A Method for Efficient Knowledge Transfer Across Networks
Transfer learning is a popular technique for improving the performance of
neural networks. However, existing methods are limited to transferring
parameters between networks with same architectures. We present a method for
transferring parameters between neural networks with different architectures.
Our method, called DPIAT, uses dynamic programming to match blocks and layers
between architectures and transfer parameters efficiently. Compared to existing
parameter prediction and random initialization methods, it significantly
improves training efficiency and validation accuracy. In experiments on
ImageNet, our method improved validation accuracy by an average of 1.6 times
after 50 epochs of training. DPIAT allows both researchers and neural
architecture search systems to modify trained networks and reuse knowledge,
avoiding the need for retraining from scratch. We also introduce a network
architecture similarity measure, enabling users to choose the best source
network without any training.Comment: 23 pages, 16 figure
A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models
Distillation from Weak Teacher (DWT) is a method of transferring knowledge
from a smaller, weaker teacher model to a larger student model to improve its
performance. Previous studies have shown that DWT can be effective in the
vision domain and natural language processing (NLP) pre-training stage.
Specifically, DWT shows promise in practical scenarios, such as enhancing new
generation or larger models using pre-trained yet older or smaller models and
lacking a resource budget. However, the optimal conditions for using DWT have
yet to be fully investigated in NLP pre-training. Therefore, this study
examines three key factors to optimize DWT, distinct from those used in the
vision domain or traditional knowledge distillation. These factors are: (i) the
impact of teacher model quality on DWT effectiveness, (ii) guidelines for
adjusting the weighting value for DWT loss, and (iii) the impact of parameter
remapping as a student model initialization technique for DWT.Comment: Findings of ACL 202
Recommended from our members
Neuromodulated attention and goal-driven perception in uncertain domains.
In uncertain domains, the goals are often unknown and need to be predicted by the organism or system. In this paper, contrastive Excitation Backprop (c-EB) was used in two goal-driven perception tasks - one with pairs of noisy MNIST digits and the other with a robot in an action-based attention scenario. The first task included attending to even, odd, low, and high digits, whereas the second task included action goals, such as "eat", "work-on-computer", "read", and "say-hi" that led to attention to objects associated with those actions. The system needed to increase attention to target items and decrease attention to distractor items and background noise. Because the valid goal was unknown, an online learning model based on the cholinergic and noradrenergic neuromodulatory systems was used to predict a noisy goal (expected uncertainty) and re-adapt when the goal changed (unexpected uncertainty). This neurobiologically plausible model demonstrates how neuromodulatory systems can predict goals in uncertain domains and how attentional mechanisms can enhance the perception for that goal
Autonomously Reconfigurable Artificial Neural Network on a Chip
Artificial neural network (ANN), an established bio-inspired computing paradigm, has proved very effective in a variety of real-world problems and particularly useful for various emerging biomedical applications using specialized ANN hardware. Unfortunately, these ANN-based systems are increasingly vulnerable to both transient and permanent faults due to unrelenting advances in CMOS technology scaling, which sometimes can be catastrophic. The considerable resource and energy consumption and the lack of dynamic adaptability make conventional fault-tolerant techniques unsuitable for future portable medical solutions. Inspired by the self-healing and self-recovery mechanisms of human nervous system, this research seeks to address reliability issues of ANN-based hardware by proposing an Autonomously Reconfigurable Artificial Neural Network (ARANN) architectural framework. Leveraging the homogeneous structural characteristics of neural networks, ARANN is capable of adapting its structures and operations, both algorithmically and microarchitecturally, to react to unexpected neuron failures. Specifically, we propose three key techniques --- Distributed ANN, Decoupled Virtual-to-Physical Neuron Mapping, and Dual-Layer Synchronization --- to achieve cost-effective structural adaptation and ensure accurate system recovery. Moreover, an ARANN-enabled self-optimizing workflow is presented to adaptively explore a "Pareto-optimal" neural network structure for a given application, on the fly. Implemented and demonstrated on a Virtex-5 FPGA, ARANN can cover and adapt 93% chip area (neurons) with less than 1% chip overhead and O(n) reconfiguration latency. A detailed performance analysis has been completed based on various recovery scenarios
Binocular fusion and invariant category learning due to predictive remapping during scanning of a depthful scene with eye movements
How does the brain maintain stable fusion of 3D scenes when the eyes move? Every eye movement causes each retinal position to process a different set of scenic features, and thus the brain needs to binocularly fuse new combinations of features at each position after an eye movement. Despite these breaks in retinotopic fusion due to each movement, previously fused representations of a scene in depth often appear stable. The 3D ARTSCAN neural model proposes how the brain does this by unifying concepts about how multiple cortical areas in the What and Where cortical streams interact to coordinate processes of 3D boundary and surface perception, spatial attention, invariant object category learning, predictive remapping, eye movement control, and learned coordinate transformations. The model explains data from single neuron and psychophysical studies of covert visual attention shifts prior to eye movements. The model further clarifies how perceptual, attentional, and cognitive interactions among multiple brain regions (LGN, V1, V2, V3A, V4, MT, MST, PPC, LIP, ITp, ITa, SC) may accomplish predictive remapping as part of the process whereby view-invariant object categories are learned. These results build upon earlier neural models of 3D vision and figure-ground separation and the learning of invariant object categories as the eyes freely scan a scene. A key process concerns how an object's surface representation generates a form-fitting distribution of spatial attention, or attentional shroud, in parietal cortex that helps maintain the stability of multiple perceptual and cognitive processes. Predictive eye movement signals maintain the stability of the shroud, as well as of binocularly fused perceptual boundaries and surface representations.Published versio
Neural dynamics of invariant object recognition: relative disparity, binocular fusion, and predictive eye movements
How does the visual cortex learn invariant object categories as an observer scans
a depthful scene? Two neural processes that contribute to this ability are modeled in this
thesis.
The first model clarifies how an object is represented in depth. Cortical area V1
computes absolute disparity, which is the horizontal difference in retinal location of an
image in the left and right foveas. Many cells in cortical area V2 compute relative
disparity, which is the difference in absolute disparity of two visible features. Relative,
but not absolute, disparity is unaffected by the distance of visual stimuli from an
observer, and by vergence eye movements. A laminar cortical model of V2 that includes
shunting lateral inhibition of disparity-sensitive layer 4 cells causes a peak shift in cell
responses that transforms absolute disparity from V1 into relative disparity in V2.
The second model simulates how the brain maintains stable percepts of a 3D
scene during binocular movements. The visual cortex initiates the formation of a 3D boundary and surface representation by binocularly fusing corresponding features from
the left and right retinotopic images. However, after each saccadic eye movement, every
scenic feature projects to a different combination of retinal positions than before the
saccade. Yet the 3D representation, resulting from the prior fusion, is stable through the
post-saccadic re-fusion. One key to stability is predictive remapping: the system
anticipates the new retinal positions of features entailed by eye movements by using gain
fields that are updated by eye movement commands. The 3D ARTSCAN model
developed here simulates how perceptual, attentional, and cognitive interactions across
different brain regions within the What and Where visual processing streams interact to
coordinate predictive remapping, stable 3D boundary and surface perception, spatial
attention, and the learning of object categories that are invariant to changes in an object's
retinal projections. Such invariant learning helps the system to avoid treating each new
view of the same object as a distinct object to be learned. The thesis hereby shows how a
process that enables invariant object category learning can be extended to also enable
stable 3D scene perception
- …