27 research outputs found
The Life Cycle of Knowledge in Big Language Models: A Survey
Knowledge plays a critical role in artificial intelligence. Recently, the
extensive success of pre-trained language models (PLMs) has raised significant
attention about how knowledge can be acquired, maintained, updated and used by
language models. Despite the enormous amount of related studies, there still
lacks a unified view of how knowledge circulates within language models
throughout the learning, tuning, and application processes, which may prevent
us from further understanding the connections between current progress or
realizing existing limitations. In this survey, we revisit PLMs as
knowledge-based systems by dividing the life circle of knowledge in PLMs into
five critical periods, and investigating how knowledge circulates when it is
built, maintained and used. To this end, we systematically review existing
studies of each period of the knowledge life cycle, summarize the main
challenges and current limitations, and discuss future directions.Comment: paperlist: https://github.com/c-box/KnowledgeLifecycl
APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding
Transformer-based networks have achieved impressive performance in 3D point
cloud understanding. However, most of them concentrate on aggregating local
features, but neglect to directly model global dependencies, which results in a
limited effective receptive field. Besides, how to effectively incorporate
local and global components also remains challenging. To tackle these problems,
we propose Asymmetric Parallel Point Transformer (APPT). Specifically, we
introduce Global Pivot Attention to extract global features and enlarge the
effective receptive field. Moreover, we design the Asymmetric Parallel
structure to effectively integrate local and global information. Combined with
these designs, APPT is able to capture features globally throughout the entire
network while focusing on local-detailed features. Extensive experiments show
that our method outperforms the priors and achieves state-of-the-art on several
benchmarks for 3D point cloud understanding, such as 3D semantic segmentation
on S3DIS, 3D shape classification on ModelNet40, and 3D part segmentation on
ShapeNet
One-shot Implicit Animatable Avatars with Model-based Priors
Existing neural rendering methods for creating human avatars typically either
require dense input signals such as video or multi-view images, or leverage a
learned prior from large-scale specific 3D human datasets such that
reconstruction can be performed with sparse-view inputs. Most of these methods
fail to achieve realistic reconstruction when only a single image is available.
To enable the data-efficient creation of realistic animatable 3D humans, we
propose ELICIT, a novel method for learning human-specific neural radiance
fields from a single image. Inspired by the fact that humans can effortlessly
estimate the body geometry and imagine full-body clothing from a single image,
we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior.
Specifically, ELICIT utilizes the 3D body shape geometry prior from a skinned
vertex-based template model (i.e., SMPL) and implements the visual clothing
semantic prior with the CLIP-based pretrained models. Both priors are used to
jointly guide the optimization for creating plausible content in the invisible
areas. Taking advantage of the CLIP models, ELICIT can use text descriptions to
generate text-conditioned unseen regions. In order to further improve visual
details, we propose a segmentation-based sampling strategy that locally refines
different parts of the avatar. Comprehensive evaluations on multiple popular
benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT
has outperformed strong baseline methods of avatar creation when only a single
image is available. The code is public for research purposes at
https://huangyangyi.github.io/ELICIT/.Comment: To appear at ICCV 2023. Project website:
https://huangyangyi.github.io/ELICIT
Designing Several Types of Oscillation-Less and High-Resolution Hybrid Schemes on Block-Structured Grids
Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models
Memory is one of the most essential cognitive functions serving as a
repository of world knowledge and episodes of activities. In recent years,
large-scale pre-trained language models have shown remarkable memorizing
ability. On the contrary, vanilla neural networks without pre-training have
been long observed suffering from the catastrophic forgetting problem. To
investigate such a retentive-forgetful contradiction and understand the memory
mechanism of language models, we conduct thorough experiments by controlling
the target knowledge types, the learning strategies and the learning schedules.
We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads
to retentive language models; 3) Knowledge relevance and diversification
significantly influence the memory formation. These conclusions are useful for
understanding the abilities of pre-trained language models and shed light on
designing and evaluating new learning and inference algorithms of language
models
Learning In-context Learning for Named Entity Recognition
Named entity recognition in real-world applications suffers from the
diversity of entity types, the emergence of new entity types, and the lack of
high-quality annotations. To address the above problems, this paper proposes an
in-context learning-based NER approach, which can effectively inject in-context
NER ability into PLMs and recognize entities of novel types on-the-fly using
only a few demonstrative instances. Specifically, we model PLMs as a
meta-function , and a new entity extractor can be implicitly constructed by applying new
instruction and demonstrations to PLMs, i.e., (instruction, demonstrations) where will be
a new entity extractor, i.e., : text entities. To inject the
above in-context NER ability into PLMs, we propose a meta-function pre-training
algorithm, which pre-trains PLMs by comparing the (instruction,
demonstration)-initialized extractor with a surrogate golden extractor.
Experimental results on 4 few-shot NER datasets show that our method can
effectively inject in-context NER ability into PLMs and significantly
outperforms the PLMs+fine-tuning counterparts.Comment: Accepted to ACL 2023 Main Conferenc
Towards In-Distribution Compatible Out-of-Distribution Detection
Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. However, we find that these intuitively designed outlier training criteria can hurt in-distribution learning and eventually lead to inferior performance. To this end, we identify three causes of the in-distribution incompatibility: contradictory gradient, false likelihood, and distribution shift. Based on our new understandings, we propose a new out-of-distribution detection method by adapting both the top-design of deep models and the loss function. Our method achieves in-distribution compatibility by pursuing less interference with the probabilistic characteristic of in-distribution features. On several benchmarks, our method not only achieves the state-of-the-art out-of-distribution detection performance but also improves the in-distribution accuracy
Effectiveness of Inactivated COVID-19 Vaccines against Delta-Variant COVID-19: Evidence from an Outbreak in Inner Mongolia Autonomous Region, China
Phase 3 clinical trials and real-world effectiveness studies showed that China’s two main inactivated COVID-19 vaccines are very effective against serious illness. In November 2021, an outbreak occurred in the Inner Mongolia Autonomous Region that provided an opportunity to assess the vaccine effectiveness (VE) of these inactivated vaccines against COVID-19 caused by the delta variant. We evaluated VE with a retrospective cohort study of close contacts of infected individuals, using a generalized linear model with binomial distribution and log-link function to estimate risk ratios (RR) and VE. A total of 8842 close contacts were studied. Compared with no vaccination and adjusted for age, presence of comorbidity, and time since last vaccination, full vaccination reduced symptomatic infection by 62%, pneumonia by 64% and severe COVID-19 by 90%; reductions associated with homologous booster doses were 83% for symptomatic infection, 92% for pneumonia and 100% for severe COVID-19. There was no significant decline in two-dose VE for any outcome for up to 325 days following the last dose. There were no differences by vaccine brand. Inactivated vaccines were effective against delta-variant illness, and were highly effective against pneumonia and severe COVID-19; VE was increased by booster doses