362 research outputs found
Decouple knowledge from paramters for plug-and-play language modeling
Pre-trained language models(PLM) have made impressive results in various NLP
tasks. It has been revealed that one of the key factors to their success is the
parameters of these models implicitly learn all kinds of knowledge during
pre-training. However, encoding knowledge implicitly in the model parameters
has two fundamental drawbacks. First, the knowledge is neither editable nor
scalable once the model is trained, which is especially problematic in that
knowledge is consistently evolving. Second, it lacks interpretability and
prevents humans from understanding which knowledge PLM requires for a certain
problem. In this paper, we introduce PlugLM, a pre-training model with
differentiable plug-in memory(DPM). The key intuition is to decouple the
knowledge storage from model parameters with an editable and scalable key-value
memory and leverage knowledge in an explainable manner by knowledge retrieval
in the DPM. To justify this design choice, we conduct evaluations in three
settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements
across four domains on average without any in-domain pre-training. (2)
knowledge update. PlugLM could absorb new knowledge in a training-free way
after pre-training is done. (3) in-task knowledge learning. PlugLM could be
further improved by incorporating training samples into DPM with knowledge
prompting.Comment: ACL2023 Finding
Valley vortex states and degeneracy lifting via photonic higher-band excitation
We demonstrate valley-dependent vortex generation in a photonic graphene.
Without breaking the inversion symmetry, excitation of two equivalent valleys
leads to formation of an optical vortex upon Bragg-reflection to the third
valley, with its chirality determined by the valley degree of freedom.
Vortex-antivortex pairs with valley-dependent topological charge flipping are
also observed and corroborated by numerical simulations. Furthermore, we
develop a three-band effective Hamiltonian model to describe the dynamics of
the coupled valleys, and find that the commonly used two-band model is not
sufficient to explain the observed vortex degeneracy lifting. Such
valley-polarized vortex states arise from high-band excitation without
inversion symmetry breaking or synthetic-field-induced gap opening. Our results
from a photonic setting may provide insight for the study of valley contrasting
and Berry-phase mediated topological phenomena in other systems
YOLO-FaceV2: A Scale and Occlusion Aware Face Detector
In recent years, face detection algorithms based on deep learning have made
great progress. These algorithms can be generally divided into two categories,
i.e. two-stage detector like Faster R-CNN and one-stage detector like YOLO.
Because of the better balance between accuracy and speed, one-stage detectors
have been widely used in many applications. In this paper, we propose a
real-time face detector based on the one-stage detector YOLOv5, named
YOLO-FaceV2. We design a Receptive Field Enhancement module called RFE to
enhance receptive field of small face, and use NWD Loss to make up for the
sensitivity of IoU to the location deviation of tiny objects. For face
occlusion, we present an attention module named SEAM and introduce Repulsion
Loss to solve it. Moreover, we use a weight function Slide to solve the
imbalance between easy and hard samples and use the information of the
effective receptive field to design the anchor. The experimental results on
WiderFace dataset show that our face detector outperforms YOLO and its variants
can be find in all easy, medium and hard subsets. Source code in
https://github.com/Krasjet-Yu/YOLO-FaceV
From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News
In the digital era, the rapid propagation of fake news and rumors via social
networks brings notable societal challenges and impacts public opinion
regulation. Traditional fake news modeling typically forecasts the general
popularity trends of different groups or numerically represents opinions shift.
However, these methods often oversimplify real-world complexities and overlook
the rich semantic information of news text. The advent of large language models
(LLMs) provides the possibility of modeling subtle dynamics of opinion.
Consequently, in this work, we introduce a Fake news Propagation Simulation
framework (FPS) based on LLM, which studies the trends and control of fake news
propagation in detail. Specifically, each agent in the simulation represents an
individual with a distinct personality. They are equipped with both short-term
and long-term memory, as well as a reflective mechanism to mimic human-like
thinking. Every day, they engage in random opinion exchanges, reflect on their
thinking, and update their opinions. Our simulation results uncover patterns in
fake news propagation related to topic relevance, and individual traits,
aligning with real-world observations. Additionally, we evaluate various
intervention strategies and demonstrate that early and appropriately frequent
interventions strike a balance between governance cost and effectiveness,
offering valuable insights for practical applications. Our study underscores
the significant utility and potential of LLMs in combating fake news
- …