163 research outputs found
Towards the Law of Capacity Gap in Distilling Language Models
Language model (LM) distillation is a trending area that aims to distil the
knowledge resided in a large teacher LM to a small student one. While various
methods have been proposed to push the distillation to its limits, it is still
a pain distilling LMs when a large capacity gap is exhibited between the
teacher and the student LMs. The pain is mainly resulted by the curse of
capacity gap, which describes that a larger teacher LM cannot always lead to a
better student LM than one distilled from a smaller teacher LM due to the
affect of capacity gap increment. That is, there is likely an optimal point
yielding the best student LM along the scaling course of the teacher LM. Even
worse, the curse of capacity gap can be only partly yet not fully lifted as
indicated in previous studies.
However, the tale is not ever one-sided. Although a larger teacher LM has
better performance than a smaller teacher LM, it is much more
resource-demanding especially in the context of recent large LMs (LLMs).
Consequently, instead of sticking to lifting the curse, leaving the curse as is
should be arguably fine. Even better, in this paper, we reveal that the optimal
capacity gap is almost consistent across different student scales and
architectures, fortunately turning the curse into the law of capacity gap. The
law later guides us to distil a 3B student LM (termed MiniMA) from a 7B teacher
LM (adapted LLaMA2-7B). MiniMA is demonstrated to yield a new
compute-performance pareto frontier among existing 3B LMs on commonly used
benchmarks, and its instruction-tuned version (termed MiniChat) outperforms a
wide range of 3B competitors in GPT4 evaluation and could even compete with
several 7B chat models.Comment: 22 pages, 8 figures, 12 tables, work in progress. Code and
checkpoints are available at https://github.com/GeneZC/MiniM
Diversity Order Analysis for Quantized Constant Envelope Transmission
Quantized constant envelope (QCE) transmission is a popular and effective
technique to reduce the hardware cost and improve the power efficiency of 5G
and beyond systems equipped with large antenna arrays. It has been widely
observed that the number of quantization levels has a substantial impact on the
system performance. This paper aims to quantify the impact of the number of
quantization levels on the system performance. Specifically, we consider a
downlink single-user multiple-input-single-output (MISO) system with M-phase
shift keying (PSK) constellation under the Rayleigh fading channel. We first
derive a novel bound on the system symbol error probability (SEP). Based on the
derived SEP bound, we characterize the achievable diversity order of the
quantized matched filter (MF) precoding strategy. Our results show that full
diversity order can be achieved when the number of quantization levels L is
greater than the PSK constellation order M, i.e., L>M, only half diversity
order is achievable when L=M, and the achievable diversity order is 0 when L<M.
Simulation results verify our theoretical analysis.Comment: 9 pages, 3 figures, submitted for possible publicatio
Dwarf galaxies with the highest concentration are not thicker than ordinary dwarf galaxies
The formation mechanism of high-concentration dwarf galaxies is still a
mystery. We perform a comparative study of the intrinsic shape of nearby
low-mass galaxies with different stellar concentration. The intrinsic shape is
parameterized by the intermediate-to-major axis ratios B/A and the
minor-to-major axis ratios C/A of triaxial ellipsoidal models. Our galaxies
( < < ) are selected to have
spectroscopic redshift from SDSS or GAMA, and have broadband optical images
from the HSC-SSP Wide layer survey. The deep HSC-SSP images allow to measure
the apparent axis ratios at galactic radii beyond the central star-forming
area of our galaxies. We infer the intrinsic axis ratios based on the
distributions. We find that 1) our galaxies have typical intrinsic shape
similarly close to be oblate ( 0.9--1), regardless of the
concentration, stellar mass, star formation activity, and local environment
(being central or satellite); 2) galaxies with the highest concentration tend
to have intrinsic thickness similar to or (in virtually all cases) slightly
thinner (i.e. smaller mean or equivalently lower triaxiality) than
ordinary galaxies, regardless of other properties explored here. This appears
to be in contrast with the expectation of the classic merger scenario for
high-concentration galaxies. Given the lack of a complete understanding of
dwarf-dwarf merger, we cannot draw a definite conclusion about the relevance of
mergers in the formation of high-concentration dwarfs. Other mechanisms such as
halo spin may also play important roles in the formation of high-concentration
dwarf galaxies.Comment: 12 pages, 8 figures, 2 tables, accepted for publication in Ap
Stiefel-Whitney topological charges in a three-dimensional acoustic nodal-line crystal
Band topology of materials describes the extent Bloch wavefunctions are
twisted in momentum space. Such descriptions rely on a set of topological
invariants, generally referred to as topological charges, which form a
characteristic class in the mathematical structure of fiber bundles associated
with the Bloch wavefunctions. For example, the celebrated Chern number and its
variants belong to the Chern class, characterizing topological charges for
complex Bloch wavefunctions. Nevertheless, under the space-time inversion
symmetry, Bloch wavefunctions can be purely real in the entire momentum space;
consequently, their topological classification does not fall into the Chern
class, but requires another characteristic class known as the Stiefel-Whitney
class. Here, in a three-dimensional acoustic crystal, we demonstrate a
topological nodal-line semimetal that is characterized by a doublet of
topological charges, the first and second Stiefel-Whitney numbers,
simultaneously. Such a doubly charged nodal line gives rise to a doubled
bulk-boundary correspondence: while the first Stiefel-Whitney number induces
ordinary drumhead states of the nodal line, the second Stiefel-Whitney number
supports hinge Fermi arc states at odd inversion-related pairs of hinges. These
results establish the Stiefel-Whitney topological charges as intrinsic
topological invariants for topological materials, with their unique
bulk-boundary correspondence beyond the conventional framework of topological
band theory.Comment: 12 pages, 10 figure
Brain-specific Crmp2 deletion leads to neuronal development deficits and behavioural impairments in mice
Acknowledgements: This work was supported by grants from NSF (31430037/31271156/ 31270826) and MOST (2014CB942801/2012CB517904/2012YQ03026006) to Z.X.; from NIH (NS048271, MH105128) to G.-l.M., from NIH (NS047344) to H.S., and from NRASAD to E.K. and K.M.C. Author notes: Hongsheng Zhang, Eunchai Kang and Yaqing Wang: These authors contributed equally to this work.Peer reviewedPublisher PD
Strong magnon-magnon coupling in an ultralow damping all-magnetic-insulator heterostructure
Magnetic insulators such as yttrium iron garnets (YIGs) are of paramount
importance for spin-wave or magnonic devices as their ultralow damping enables
ultralow power dissipation that is free of Joule heating, exotic magnon quantum
state, and coherent coupling to other wave excitations. Magnetic insulator
heterostructures bestow superior structural and magnetic properties and house
immense design space thanks to the strong and engineerable exchange interaction
between individual layers. To fully unleash their potential, realizing low
damping and strong exchange coupling simultaneously is critical, which often
requires high quality interface. Here, we show that such a demand is realized
in an all-insulator thulium iron garnet (TmIG)/YIG bilayer system. The ultralow
dissipation rates in both YIG and TmIG, along with their significant spin-spin
interaction at the interface, enable strong and coherent magnon-magnon coupling
with a benchmarking cooperativity value larger than the conventional
ferromagnetic metal-based heterostructures. The coupling strength can be tuned
by varying the magnetic insulator layer thickness and magnon modes, which is
consistent with analytical calculations and micromagnetic simulations. Our
results demonstrate TmIG/YIG as a novel platform for investigating hybrid
magnonic phenomena and open opportunities in magnon devices comprising
all-insulator heterostructures.Comment: 45 pages, 18 figures, and 2 table
Initialization of nanowire or cluster growth critically controlled by the effective V/III ratio at the early nucleation stage
For self-catalyzed nanowires (NWs), reports on how the catalytic droplet initiates successful NW growth are still lacking, making it difficult to control the yield and often accompanying a high density of clusters. Here, we have performed a systematic study on this issue, which reveals that the effective V/III ratio at the initial growth stage is a critical factor that governs the NW growth yield. To initiate NW growth, the ratio should be high enough to allow the nucleation to extend to the entire contact area between the droplet and substrate, which can elevate the droplet off of the substrate, but it should not be too high in order to keep the droplet. This study also reveals that the cluster growth between NWs is also initiated from large droplets. This study provides a new angle from the growth condition to explain the cluster formation mechanism, which can guide high-yield NW growth
- …