Search CORE

27 research outputs found

The Life Cycle of Knowledge in Big Language Models: A Survey

Author: Cao Boxi
Han Xianpei
Lin Hongyu
Sun Le
Publication venue
Publication date: 13/03/2023
Field of study

Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.Comment: paperlist: https://github.com/c-box/KnowledgeLifecycl

arXiv.org e-Print Archive

APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding

Author: Cai Deng
Chi Zhihao
Li Hengjia
Lin Binbin
Wang Wenxiao
Wu Boxi
Yang Zheng
Zheng Tu
Publication venue
Publication date: 31/03/2023
Field of study

Transformer-based networks have achieved impressive performance in 3D point cloud understanding. However, most of them concentrate on aggregating local features, but neglect to directly model global dependencies, which results in a limited effective receptive field. Besides, how to effectively incorporate local and global components also remains challenging. To tackle these problems, we propose Asymmetric Parallel Point Transformer (APPT). Specifically, we introduce Global Pivot Attention to extract global features and enlarge the effective receptive field. Moreover, we design the Asymmetric Parallel structure to effectively integrate local and global information. Combined with these designs, APPT is able to capture features globally throughout the entire network while focusing on local-detailed features. Extensive experiments show that our method outperforms the priors and achieves state-of-the-art on several benchmarks for 3D point cloud understanding, such as 3D semantic segmentation on S3DIS, 3D shape classification on ModelNet40, and 3D part segmentation on ShapeNet

arXiv.org e-Print Archive

One-shot Implicit Animatable Avatars with Model-based Priors

Author: Cai Deng
Huang Yangyi
Lin Binbin
Liu Weiyang
Wang Haofan
Wang Wenxiao
Wu Boxi
Yi Hongwei
Zhang Debing
Publication venue
Publication date: 21/08/2023
Field of study

Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can effortlessly estimate the body geometry and imagine full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT utilizes the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pretrained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. Taking advantage of the CLIP models, ELICIT can use text descriptions to generate text-conditioned unseen regions. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar. Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed strong baseline methods of avatar creation when only a single image is available. The code is public for research purposes at https://huangyangyi.github.io/ELICIT/.Comment: To appear at ICCV 2023. Project website: https://huangyangyi.github.io/ELICIT

arXiv.org e-Print Archive

Designing Several Types of Oscillation-Less and High-Resolution Hybrid Schemes on Block-Structured Grids

Author: Boxi Lin
Chao Yan
Jian Yu
Yu
Zhenhua Jiang
Publication venue: 'Global Science Press'
Publication date
Field of study

Crossref

Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Author: Cao Boxi
Chen Jiawei
Han Xianpei
Lin Hongyu
Sun Le
Tang Qiaoyu
Wang Tianshu
Publication venue
Publication date: 15/05/2023
Field of study

Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models

arXiv.org e-Print Archive

HLLC+: Low-Mach Shock-Stable HLLC-Type Riemann Solver for All-Speed Flows

Author: Boxi Lin
Chao Yan
Moschetta J.-M.
Shusheng Chen
Turkel E.
Wada Y.
Yansu Li
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Learning In-context Learning for Named Entity Recognition

Author: Cao Boxi
Chen Jiawei
Dai Dai
Han Xianpei
Jia Wei
Lin Hongyu
Lou Jie
Lu Yaojie
Sun Le
Wu Hua
Publication venue
Publication date: 18/05/2023
Field of study

Named entity recognition in real-world applications suffers from the diversity of entity types, the emergence of new entity types, and the lack of high-quality annotations. To address the above problems, this paper proposes an in-context learning-based NER approach, which can effectively inject in-context NER ability into PLMs and recognize entities of novel types on-the-fly using only a few demonstrative instances. Specifically, we model PLMs as a meta-function

\mathcal{ \lambda_ {\text{instruction, demonstrations, text}}. M}

, and a new entity extractor can be implicitly constructed by applying new instruction and demonstrations to PLMs, i.e.,

\mathcal{ (\lambda . M) }

(instruction, demonstrations)

\to

\mathcal{F}

where

\mathcal{F}

will be a new entity extractor, i.e.,

\mathcal{F}

: text

\to

entities. To inject the above in-context NER ability into PLMs, we propose a meta-function pre-training algorithm, which pre-trains PLMs by comparing the (instruction, demonstration)-initialized extractor with a surrogate golden extractor. Experimental results on 4 few-shot NER datasets show that our method can effectively inject in-context NER ability into PLMs and significantly outperforms the PLMs+fine-tuning counterparts.Comment: Accepted to ACL 2023 Main Conferenc

arXiv.org e-Print Archive

Towards In-Distribution Compatible Out-of-Distribution Detection

Author: Cai Deng
Du Zifan
He Xiaofei
Jiang Jie
Li Zhifeng
Lin Binbin
Liu Wei
Ren Haidong
Wang Wenxiao
Wu Boxi
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 26/06/2023
Field of study

Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. However, we find that these intuitively designed outlier training criteria can hurt in-distribution learning and eventually lead to inferior performance. To this end, we identify three causes of the in-distribution incompatibility: contradictory gradient, false likelihood, and distribution shift. Based on our new understandings, we propose a new out-of-distribution detection method by adapting both the top-design of deep models and the loss function. Our method achieves in-distribution compatibility by pursuing less interference with the probabilistic characteristic of in-distribution features. On several benchmarks, our method not only achieves the state-of-the-art out-of-distribution detection performance but also improves the in-distribution accuracy

Association for the Advancement of Artificial Intelligence: AAAI Publications

Effectiveness of Inactivated COVID-19 Vaccines against Delta-Variant COVID-19: Evidence from an Outbreak in Inner Mongolia Autonomous Region, China

Author: Boxi Liu
Chang Huang
Chao Ma
Dan Wu
Dongyan Liu
Fuli Chi
Junhong Li
Lance Rodewald
Lin Tang
Shengli Lang
Weiwei Sun
Wenrui Wang
Xiaofeng Jiang
Xiaoling Tian
Yifan Song
Yudan Song
Zhijie An
Zundong Yin
Publication venue: 'MDPI AG'
Publication date: 28/01/2023
Field of study

Phase 3 clinical trials and real-world effectiveness studies showed that China’s two main inactivated COVID-19 vaccines are very effective against serious illness. In November 2021, an outbreak occurred in the Inner Mongolia Autonomous Region that provided an opportunity to assess the vaccine effectiveness (VE) of these inactivated vaccines against COVID-19 caused by the delta variant. We evaluated VE with a retrospective cohort study of close contacts of infected individuals, using a generalized linear model with binomial distribution and log-link function to estimate risk ratios (RR) and VE. A total of 8842 close contacts were studied. Compared with no vaccination and adjusted for age, presence of comorbidity, and time since last vaccination, full vaccination reduced symptomatic infection by 62%, pneumonia by 64% and severe COVID-19 by 90%; reductions associated with homologous booster doses were 83% for symptomatic infection, 92% for pneumonia and 100% for severe COVID-19. There was no significant decline in two-dose VE for any outcome for up to 325 days following the last dose. There were no differences by vaccine brand. Inactivated vaccines were effective against delta-variant illness, and were highly effective against pneumonia and severe COVID-19; VE was increased by booster doses

Multidisciplinary Digital Publishing Institute