163 research outputs found

    Towards the Law of Capacity Gap in Distilling Language Models

    Full text link
    Language model (LM) distillation is a trending area that aims to distil the knowledge resided in a large teacher LM to a small student one. While various methods have been proposed to push the distillation to its limits, it is still a pain distilling LMs when a large capacity gap is exhibited between the teacher and the student LMs. The pain is mainly resulted by the curse of capacity gap, which describes that a larger teacher LM cannot always lead to a better student LM than one distilled from a smaller teacher LM due to the affect of capacity gap increment. That is, there is likely an optimal point yielding the best student LM along the scaling course of the teacher LM. Even worse, the curse of capacity gap can be only partly yet not fully lifted as indicated in previous studies. However, the tale is not ever one-sided. Although a larger teacher LM has better performance than a smaller teacher LM, it is much more resource-demanding especially in the context of recent large LMs (LLMs). Consequently, instead of sticking to lifting the curse, leaving the curse as is should be arguably fine. Even better, in this paper, we reveal that the optimal capacity gap is almost consistent across different student scales and architectures, fortunately turning the curse into the law of capacity gap. The law later guides us to distil a 3B student LM (termed MiniMA) from a 7B teacher LM (adapted LLaMA2-7B). MiniMA is demonstrated to yield a new compute-performance pareto frontier among existing 3B LMs on commonly used benchmarks, and its instruction-tuned version (termed MiniChat) outperforms a wide range of 3B competitors in GPT4 evaluation and could even compete with several 7B chat models.Comment: 22 pages, 8 figures, 12 tables, work in progress. Code and checkpoints are available at https://github.com/GeneZC/MiniM

    Diversity Order Analysis for Quantized Constant Envelope Transmission

    Full text link
    Quantized constant envelope (QCE) transmission is a popular and effective technique to reduce the hardware cost and improve the power efficiency of 5G and beyond systems equipped with large antenna arrays. It has been widely observed that the number of quantization levels has a substantial impact on the system performance. This paper aims to quantify the impact of the number of quantization levels on the system performance. Specifically, we consider a downlink single-user multiple-input-single-output (MISO) system with M-phase shift keying (PSK) constellation under the Rayleigh fading channel. We first derive a novel bound on the system symbol error probability (SEP). Based on the derived SEP bound, we characterize the achievable diversity order of the quantized matched filter (MF) precoding strategy. Our results show that full diversity order can be achieved when the number of quantization levels L is greater than the PSK constellation order M, i.e., L>M, only half diversity order is achievable when L=M, and the achievable diversity order is 0 when L<M. Simulation results verify our theoretical analysis.Comment: 9 pages, 3 figures, submitted for possible publicatio

    Dwarf galaxies with the highest concentration are not thicker than ordinary dwarf galaxies

    Full text link
    The formation mechanism of high-concentration dwarf galaxies is still a mystery. We perform a comparative study of the intrinsic shape of nearby low-mass galaxies with different stellar concentration. The intrinsic shape is parameterized by the intermediate-to-major axis ratios B/A and the minor-to-major axis ratios C/A of triaxial ellipsoidal models. Our galaxies (107.5M10^{7.5} M_\odot < MM_\star < 1010.0M10^{10.0} M_\odot) are selected to have spectroscopic redshift from SDSS or GAMA, and have broadband optical images from the HSC-SSP Wide layer survey. The deep HSC-SSP images allow to measure the apparent axis ratios qq at galactic radii beyond the central star-forming area of our galaxies. We infer the intrinsic axis ratios based on the qq distributions. We find that 1) our galaxies have typical intrinsic shape similarly close to be oblate (μB/A\mu_{B/A} \sim 0.9--1), regardless of the concentration, stellar mass, star formation activity, and local environment (being central or satellite); 2) galaxies with the highest concentration tend to have intrinsic thickness similar to or (in virtually all cases) slightly thinner (i.e. smaller mean μC/A\mu_{C/A} or equivalently lower triaxiality) than ordinary galaxies, regardless of other properties explored here. This appears to be in contrast with the expectation of the classic merger scenario for high-concentration galaxies. Given the lack of a complete understanding of dwarf-dwarf merger, we cannot draw a definite conclusion about the relevance of mergers in the formation of high-concentration dwarfs. Other mechanisms such as halo spin may also play important roles in the formation of high-concentration dwarf galaxies.Comment: 12 pages, 8 figures, 2 tables, accepted for publication in Ap

    Stiefel-Whitney topological charges in a three-dimensional acoustic nodal-line crystal

    Full text link
    Band topology of materials describes the extent Bloch wavefunctions are twisted in momentum space. Such descriptions rely on a set of topological invariants, generally referred to as topological charges, which form a characteristic class in the mathematical structure of fiber bundles associated with the Bloch wavefunctions. For example, the celebrated Chern number and its variants belong to the Chern class, characterizing topological charges for complex Bloch wavefunctions. Nevertheless, under the space-time inversion symmetry, Bloch wavefunctions can be purely real in the entire momentum space; consequently, their topological classification does not fall into the Chern class, but requires another characteristic class known as the Stiefel-Whitney class. Here, in a three-dimensional acoustic crystal, we demonstrate a topological nodal-line semimetal that is characterized by a doublet of topological charges, the first and second Stiefel-Whitney numbers, simultaneously. Such a doubly charged nodal line gives rise to a doubled bulk-boundary correspondence: while the first Stiefel-Whitney number induces ordinary drumhead states of the nodal line, the second Stiefel-Whitney number supports hinge Fermi arc states at odd inversion-related pairs of hinges. These results establish the Stiefel-Whitney topological charges as intrinsic topological invariants for topological materials, with their unique bulk-boundary correspondence beyond the conventional framework of topological band theory.Comment: 12 pages, 10 figure

    Brain-specific Crmp2 deletion leads to neuronal development deficits and behavioural impairments in mice

    Get PDF
    Acknowledgements: This work was supported by grants from NSF (31430037/31271156/ 31270826) and MOST (2014CB942801/2012CB517904/2012YQ03026006) to Z.X.; from NIH (NS048271, MH105128) to G.-l.M., from NIH (NS047344) to H.S., and from NRASAD to E.K. and K.M.C. Author notes: Hongsheng Zhang, Eunchai Kang and Yaqing Wang: These authors contributed equally to this work.Peer reviewedPublisher PD

    Strong magnon-magnon coupling in an ultralow damping all-magnetic-insulator heterostructure

    Full text link
    Magnetic insulators such as yttrium iron garnets (YIGs) are of paramount importance for spin-wave or magnonic devices as their ultralow damping enables ultralow power dissipation that is free of Joule heating, exotic magnon quantum state, and coherent coupling to other wave excitations. Magnetic insulator heterostructures bestow superior structural and magnetic properties and house immense design space thanks to the strong and engineerable exchange interaction between individual layers. To fully unleash their potential, realizing low damping and strong exchange coupling simultaneously is critical, which often requires high quality interface. Here, we show that such a demand is realized in an all-insulator thulium iron garnet (TmIG)/YIG bilayer system. The ultralow dissipation rates in both YIG and TmIG, along with their significant spin-spin interaction at the interface, enable strong and coherent magnon-magnon coupling with a benchmarking cooperativity value larger than the conventional ferromagnetic metal-based heterostructures. The coupling strength can be tuned by varying the magnetic insulator layer thickness and magnon modes, which is consistent with analytical calculations and micromagnetic simulations. Our results demonstrate TmIG/YIG as a novel platform for investigating hybrid magnonic phenomena and open opportunities in magnon devices comprising all-insulator heterostructures.Comment: 45 pages, 18 figures, and 2 table

    Initialization of nanowire or cluster growth critically controlled by the effective V/III ratio at the early nucleation stage

    Get PDF
    For self-catalyzed nanowires (NWs), reports on how the catalytic droplet initiates successful NW growth are still lacking, making it difficult to control the yield and often accompanying a high density of clusters. Here, we have performed a systematic study on this issue, which reveals that the effective V/III ratio at the initial growth stage is a critical factor that governs the NW growth yield. To initiate NW growth, the ratio should be high enough to allow the nucleation to extend to the entire contact area between the droplet and substrate, which can elevate the droplet off of the substrate, but it should not be too high in order to keep the droplet. This study also reveals that the cluster growth between NWs is also initiated from large droplets. This study provides a new angle from the growth condition to explain the cluster formation mechanism, which can guide high-yield NW growth
    corecore