27 research outputs found

    The Life Cycle of Knowledge in Big Language Models: A Survey

    Full text link
    Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.Comment: paperlist: https://github.com/c-box/KnowledgeLifecycl

    APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding

    Full text link
    Transformer-based networks have achieved impressive performance in 3D point cloud understanding. However, most of them concentrate on aggregating local features, but neglect to directly model global dependencies, which results in a limited effective receptive field. Besides, how to effectively incorporate local and global components also remains challenging. To tackle these problems, we propose Asymmetric Parallel Point Transformer (APPT). Specifically, we introduce Global Pivot Attention to extract global features and enlarge the effective receptive field. Moreover, we design the Asymmetric Parallel structure to effectively integrate local and global information. Combined with these designs, APPT is able to capture features globally throughout the entire network while focusing on local-detailed features. Extensive experiments show that our method outperforms the priors and achieves state-of-the-art on several benchmarks for 3D point cloud understanding, such as 3D semantic segmentation on S3DIS, 3D shape classification on ModelNet40, and 3D part segmentation on ShapeNet

    One-shot Implicit Animatable Avatars with Model-based Priors

    Full text link
    Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can effortlessly estimate the body geometry and imagine full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT utilizes the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pretrained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. Taking advantage of the CLIP models, ELICIT can use text descriptions to generate text-conditioned unseen regions. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar. Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed strong baseline methods of avatar creation when only a single image is available. The code is public for research purposes at https://huangyangyi.github.io/ELICIT/.Comment: To appear at ICCV 2023. Project website: https://huangyangyi.github.io/ELICIT

    Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

    Full text link
    Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models

    HLLC+: Low-Mach Shock-Stable HLLC-Type Riemann Solver for All-Speed Flows

    No full text

    Learning In-context Learning for Named Entity Recognition

    Full text link
    Named entity recognition in real-world applications suffers from the diversity of entity types, the emergence of new entity types, and the lack of high-quality annotations. To address the above problems, this paper proposes an in-context learning-based NER approach, which can effectively inject in-context NER ability into PLMs and recognize entities of novel types on-the-fly using only a few demonstrative instances. Specifically, we model PLMs as a meta-function λinstruction, demonstrations, text.M\mathcal{ \lambda_ {\text{instruction, demonstrations, text}}. M}, and a new entity extractor can be implicitly constructed by applying new instruction and demonstrations to PLMs, i.e., (λ.M)\mathcal{ (\lambda . M) }(instruction, demonstrations) \to F\mathcal{F} where F\mathcal{F} will be a new entity extractor, i.e., F\mathcal{F}: text \to entities. To inject the above in-context NER ability into PLMs, we propose a meta-function pre-training algorithm, which pre-trains PLMs by comparing the (instruction, demonstration)-initialized extractor with a surrogate golden extractor. Experimental results on 4 few-shot NER datasets show that our method can effectively inject in-context NER ability into PLMs and significantly outperforms the PLMs+fine-tuning counterparts.Comment: Accepted to ACL 2023 Main Conferenc

    Towards In-Distribution Compatible Out-of-Distribution Detection

    No full text
    Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. However, we find that these intuitively designed outlier training criteria can hurt in-distribution learning and eventually lead to inferior performance. To this end, we identify three causes of the in-distribution incompatibility: contradictory gradient, false likelihood, and distribution shift. Based on our new understandings, we propose a new out-of-distribution detection method by adapting both the top-design of deep models and the loss function. Our method achieves in-distribution compatibility by pursuing less interference with the probabilistic characteristic of in-distribution features. On several benchmarks, our method not only achieves the state-of-the-art out-of-distribution detection performance but also improves the in-distribution accuracy

    Effectiveness of Inactivated COVID-19 Vaccines against Delta-Variant COVID-19: Evidence from an Outbreak in Inner Mongolia Autonomous Region, China

    No full text
    Phase 3 clinical trials and real-world effectiveness studies showed that China’s two main inactivated COVID-19 vaccines are very effective against serious illness. In November 2021, an outbreak occurred in the Inner Mongolia Autonomous Region that provided an opportunity to assess the vaccine effectiveness (VE) of these inactivated vaccines against COVID-19 caused by the delta variant. We evaluated VE with a retrospective cohort study of close contacts of infected individuals, using a generalized linear model with binomial distribution and log-link function to estimate risk ratios (RR) and VE. A total of 8842 close contacts were studied. Compared with no vaccination and adjusted for age, presence of comorbidity, and time since last vaccination, full vaccination reduced symptomatic infection by 62%, pneumonia by 64% and severe COVID-19 by 90%; reductions associated with homologous booster doses were 83% for symptomatic infection, 92% for pneumonia and 100% for severe COVID-19. There was no significant decline in two-dose VE for any outcome for up to 325 days following the last dose. There were no differences by vaccine brand. Inactivated vaccines were effective against delta-variant illness, and were highly effective against pneumonia and severe COVID-19; VE was increased by booster doses
    corecore