752 research outputs found

    Breaking the Architecture Barrier: A Method for Efficient Knowledge Transfer Across Networks

    Full text link
    Transfer learning is a popular technique for improving the performance of neural networks. However, existing methods are limited to transferring parameters between networks with same architectures. We present a method for transferring parameters between neural networks with different architectures. Our method, called DPIAT, uses dynamic programming to match blocks and layers between architectures and transfer parameters efficiently. Compared to existing parameter prediction and random initialization methods, it significantly improves training efficiency and validation accuracy. In experiments on ImageNet, our method improved validation accuracy by an average of 1.6 times after 50 epochs of training. DPIAT allows both researchers and neural architecture search systems to modify trained networks and reuse knowledge, avoiding the need for retraining from scratch. We also introduce a network architecture similarity measure, enabling users to choose the best source network without any training.Comment: 23 pages, 16 figure

    A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models

    Full text link
    Distillation from Weak Teacher (DWT) is a method of transferring knowledge from a smaller, weaker teacher model to a larger student model to improve its performance. Previous studies have shown that DWT can be effective in the vision domain and natural language processing (NLP) pre-training stage. Specifically, DWT shows promise in practical scenarios, such as enhancing new generation or larger models using pre-trained yet older or smaller models and lacking a resource budget. However, the optimal conditions for using DWT have yet to be fully investigated in NLP pre-training. Therefore, this study examines three key factors to optimize DWT, distinct from those used in the vision domain or traditional knowledge distillation. These factors are: (i) the impact of teacher model quality on DWT effectiveness, (ii) guidelines for adjusting the weighting value for DWT loss, and (iii) the impact of parameter remapping as a student model initialization technique for DWT.Comment: Findings of ACL 202

    Autonomously Reconfigurable Artificial Neural Network on a Chip

    Get PDF
    Artificial neural network (ANN), an established bio-inspired computing paradigm, has proved very effective in a variety of real-world problems and particularly useful for various emerging biomedical applications using specialized ANN hardware. Unfortunately, these ANN-based systems are increasingly vulnerable to both transient and permanent faults due to unrelenting advances in CMOS technology scaling, which sometimes can be catastrophic. The considerable resource and energy consumption and the lack of dynamic adaptability make conventional fault-tolerant techniques unsuitable for future portable medical solutions. Inspired by the self-healing and self-recovery mechanisms of human nervous system, this research seeks to address reliability issues of ANN-based hardware by proposing an Autonomously Reconfigurable Artificial Neural Network (ARANN) architectural framework. Leveraging the homogeneous structural characteristics of neural networks, ARANN is capable of adapting its structures and operations, both algorithmically and microarchitecturally, to react to unexpected neuron failures. Specifically, we propose three key techniques --- Distributed ANN, Decoupled Virtual-to-Physical Neuron Mapping, and Dual-Layer Synchronization --- to achieve cost-effective structural adaptation and ensure accurate system recovery. Moreover, an ARANN-enabled self-optimizing workflow is presented to adaptively explore a "Pareto-optimal" neural network structure for a given application, on the fly. Implemented and demonstrated on a Virtex-5 FPGA, ARANN can cover and adapt 93% chip area (neurons) with less than 1% chip overhead and O(n) reconfiguration latency. A detailed performance analysis has been completed based on various recovery scenarios

    Binocular fusion and invariant category learning due to predictive remapping during scanning of a depthful scene with eye movements

    Get PDF
    How does the brain maintain stable fusion of 3D scenes when the eyes move? Every eye movement causes each retinal position to process a different set of scenic features, and thus the brain needs to binocularly fuse new combinations of features at each position after an eye movement. Despite these breaks in retinotopic fusion due to each movement, previously fused representations of a scene in depth often appear stable. The 3D ARTSCAN neural model proposes how the brain does this by unifying concepts about how multiple cortical areas in the What and Where cortical streams interact to coordinate processes of 3D boundary and surface perception, spatial attention, invariant object category learning, predictive remapping, eye movement control, and learned coordinate transformations. The model explains data from single neuron and psychophysical studies of covert visual attention shifts prior to eye movements. The model further clarifies how perceptual, attentional, and cognitive interactions among multiple brain regions (LGN, V1, V2, V3A, V4, MT, MST, PPC, LIP, ITp, ITa, SC) may accomplish predictive remapping as part of the process whereby view-invariant object categories are learned. These results build upon earlier neural models of 3D vision and figure-ground separation and the learning of invariant object categories as the eyes freely scan a scene. A key process concerns how an object's surface representation generates a form-fitting distribution of spatial attention, or attentional shroud, in parietal cortex that helps maintain the stability of multiple perceptual and cognitive processes. Predictive eye movement signals maintain the stability of the shroud, as well as of binocularly fused perceptual boundaries and surface representations.Published versio

    Neural dynamics of invariant object recognition: relative disparity, binocular fusion, and predictive eye movements

    Full text link
    How does the visual cortex learn invariant object categories as an observer scans a depthful scene? Two neural processes that contribute to this ability are modeled in this thesis. The first model clarifies how an object is represented in depth. Cortical area V1 computes absolute disparity, which is the horizontal difference in retinal location of an image in the left and right foveas. Many cells in cortical area V2 compute relative disparity, which is the difference in absolute disparity of two visible features. Relative, but not absolute, disparity is unaffected by the distance of visual stimuli from an observer, and by vergence eye movements. A laminar cortical model of V2 that includes shunting lateral inhibition of disparity-sensitive layer 4 cells causes a peak shift in cell responses that transforms absolute disparity from V1 into relative disparity in V2. The second model simulates how the brain maintains stable percepts of a 3D scene during binocular movements. The visual cortex initiates the formation of a 3D boundary and surface representation by binocularly fusing corresponding features from the left and right retinotopic images. However, after each saccadic eye movement, every scenic feature projects to a different combination of retinal positions than before the saccade. Yet the 3D representation, resulting from the prior fusion, is stable through the post-saccadic re-fusion. One key to stability is predictive remapping: the system anticipates the new retinal positions of features entailed by eye movements by using gain fields that are updated by eye movement commands. The 3D ARTSCAN model developed here simulates how perceptual, attentional, and cognitive interactions across different brain regions within the What and Where visual processing streams interact to coordinate predictive remapping, stable 3D boundary and surface perception, spatial attention, and the learning of object categories that are invariant to changes in an object's retinal projections. Such invariant learning helps the system to avoid treating each new view of the same object as a distinct object to be learned. The thesis hereby shows how a process that enables invariant object category learning can be extended to also enable stable 3D scene perception
    • …
    corecore