382 research outputs found
Habits and goals in synergy: a variational Bayesian framework for behavior
How to behave efficiently and flexibly is a central problem for understanding
biological agents and creating intelligent embodied AI. It has been well known
that behavior can be classified as two types: reward-maximizing habitual
behavior, which is fast while inflexible; and goal-directed behavior, which is
flexible while slow. Conventionally, habitual and goal-directed behaviors are
considered handled by two distinct systems in the brain. Here, we propose to
bridge the gap between the two behaviors, drawing on the principles of
variational Bayesian theory. We incorporate both behaviors in one framework by
introducing a Bayesian latent variable called "intention". The habitual
behavior is generated by using prior distribution of intention, which is
goal-less; and the goal-directed behavior is generated by the posterior
distribution of intention, which is conditioned on the goal. Building on this
idea, we present a novel Bayesian framework for modeling behaviors. Our
proposed framework enables skill sharing between the two kinds of behaviors,
and by leveraging the idea of predictive coding, it enables an agent to
seamlessly generalize from habitual to goal-directed behavior without requiring
additional training. The proposed framework suggests a fresh perspective for
cognitive science and embodied AI, highlighting the potential for greater
integration between habitual and goal-directed behaviors
FwdLLM: Efficient FedLLM using Forward Gradient
Large Language Models (LLMs) are transforming the landscape of mobile
intelligence. Federated Learning (FL), a method to preserve user data privacy,
is often employed in fine-tuning LLMs to downstream mobile tasks, an approach
known as FedLLM. Though recent efforts have addressed the network issue induced
by the vast model size, they have not practically mitigated vital challenges
concerning integration with mobile devices, such as significant memory
consumption and sluggish model convergence.
In response to these challenges, this work introduces FwdLLM, an innovative
FL protocol designed to enhance the FedLLM efficiency. The key idea of FwdLLM
to employ backpropagation (BP)-free training methods, requiring devices only to
execute ``perturbed inferences''. Consequently, FwdLLM delivers way better
memory efficiency and time efficiency (expedited by mobile NPUs and an expanded
array of participant devices). FwdLLM centers around three key designs: (1) it
combines BP-free training with parameter-efficient training methods, an
essential way to scale the approach to the LLM era; (2) it systematically and
adaptively allocates computational loads across devices, striking a careful
balance between convergence speed and accuracy; (3) it discriminatively samples
perturbed predictions that are more valuable to model convergence.
Comprehensive experiments with five LLMs and three NLP tasks illustrate
FwdLLM's significant advantages over conventional methods, including up to
three orders of magnitude faster convergence and a 14.6x reduction in memory
footprint. Uniquely, FwdLLM paves the way for federated learning of
billion-parameter LLMs such as LLaMA on COTS mobile devices -- a feat
previously unattained.Comment: under revie
Recommended from our members
Magneto-optical and photoemission studies of ultrathin wedges
Magnetic phase transitions of Fe wedges grown epitaxially on Cu(100) are detected via the surface magneto-optical Kerr effect and used to construct a phase diagram for face centered Fe. Also, the confinement of Cu sp- and d-quantum-well states is studied for Cu/Co(wedge)/Cu(100) utilizing undulator-based photoemission experiments
AdS/BCFT and Island for curvature-squared gravity
In this paper, we investigate AdS/BCFT for curvature-squared gravity. To warm
up, we start with Gauss-Bonnet gravity. We derive the one point function of
stress tensor and show that the central charge related to the norm of
displacement operator is positive for the couplings obeying causality
constraints. Furthermore, by imposing the null energy condition on the
end-of-the-world brane, we prove the holographic g-theorem for Gauss-Bonnet
gravity. This corrects a wrong point of view in the literature, which claims
that the holographic g-theorem is violated for Gauss-Bonnet gravity. As a
by-product, we obtain the boundary entropy and A-type boundary central charges
in general dimensions. We also study AdS/BCFT for general curvature-squared
gravity. We find that it is too restrictive for the shape of the brane and the
dual BCFT is trivial if one imposes Neumann boundary conditions for all of the
gravitational modes. Instead, we propose to impose Dirichlet boundary condition
for the massive graviton, while imposing Neumann boundary condition for the
massless graviton. In this way, we obtain non-trivial shape dependence of
stress tensor and well-defined central charges. In particular, the holographic
g-theorem is satisfied by general curvature-squared gravity. Finally, we
discuss the island and show that the Page curve can be recovered for
Gauss-Bonnet gravity. Interestingly, there are zeroth-order phase transitions
for the Page curve within one range of couplings obeying causality constraints.
Generalizing the discussions to holographic entanglement entropy and
holographic complexity in AdS/CFT, we get new constraints for the Gauss-Bonnet
coupling, which is stronger than the causality constraint.Comment: 49 pages, 29 figures, revision accepted for publication in JHEP, main
improvements: prove that our g-function can recover the universal term of
boundary entropy in general dimensions; add a toy model to explain the novel
zeroth-order phase transition of the Page curve analyticall
Recommended from our members
Spin polarization of the conduction bands and secondary electrons of Gd(0001)
Angle- and spin-resolved photoemission was utilized to investigate the 5d bulk bands and the surface state of Gd(0001) in the temperature range. of 130 - 350 K The bulk bands at 1-2 eV below the Fermi energy E{sub F} show Stoner-like behavior, while the temperature dependence of the surface state near E{sub F} indicates spin-mixing behavior due to fluctuating local 5d moments. The secondary electron spectra of the Gd surfaces both before and after initial oxygen adsorption show a polarization dip at low kinetic energies due to the extra scattering channel for minority electrons via the unoccupied 4f level. The temperature dependences of the surface and bulk magnetization are separated using the spin polarization of the surface state and the bulk exchange splitting
Superparamagnetic behavior of ultrathin Fe films grown on Al₂O₃(0001) substrates
The superparamagnetic behavior of ultrathin Fe films at various growth temperatures was studied. The films were grown on an Al₂O₃(0001) substrate by molecular beam epitaxy (MBE). The blocking temperature was strongly dependent on the growth temperature and the 1-nm-thick Fe films were in the superparamagnetic state. The results show that for growth at 673 and 773 K, Fe forms large particles and the magnetic properties are dominated by the individual particles.Yu Shiratsuchi, Masahiko Yamamoto, and Yasushi Endo, Dongqi Li and S. D. Bader, Journal of Applied Physics 94, 7675 (2003); https://doi.org/10.1063/1.1628408
Magnetic phase transition and anisotropy of ultrathin Fe films grown on inclined Al₂O₃(0001) substrates
We investigated the magnetic properties of ultrathin Fe films grown on inclined Al₂O₃(0001) substrates at various growth temperatures. We report the evolution of the magnetism with Fe thickness tFe, growth temperature, and the effect of the inclination of the substrate orientation on the magnetic anisotropy. The films are superparamagnetic (tFe≈5 monolayer, ML), ferromagnetic (tFe>15 ML), or coexistent (tFe≈10 ML). The effect of inclination of the substrate is small in the superparamagnetic region and substantial in the ferromagnetic region. Fe thin films grown on the inclined substrate have a uniaxial magnetic anisotropy with the magnetic easy axis parallel to the step edge. This uniaxial magnetic anisotropy might be derived from the effective demagnetizing field due to the magnetic charge distribution at the corrugated surface. The strength of the uniaxial magnetic anisotropy decreases as the growth temperature increases. The dependence of the uniaxial magnetic anisotropy on growth temperature is caused by the change of growth mechanism, from smooth to rough with an increasing of growth temperature.Yu Shiratsuchi, Yasushi Endo, and Masahiko Yamamoto, Dongqi Li and S. D. Bader, Journal of Applied Physics 95, 6897 (2004); https://doi.org/10.1063/1.1667432
Probing the metal-nonmetal transition in thin metal overlayers using resonant photoemission
We have studied one and two monolayers of barium on Ni(111) and of mercury on Cu(100). Using resonant photoemission, we have found core excited electrons become delocalized with increasing barium coverage. Similarly, upon formation of the mercury bilayer (as determined by low-energy electron diffraction and by atom-beam scattering), there is a substantial increase in the screening of the photohole. A transition of the electronic structure akin to a metal-nonmetal (metal-insulator) transition is apparent in these final-state effects. The band structure for Hg is similar to the band structure expected for a free-standing film with a free-electron sd band. The delocalization of the core excited electrons resembles the exciton unbinding that occurs at the metal-nonmetal Mott transition
- …