95 research outputs found
Inside {ASCENT}: {E}xploring a Deep Commonsense Knowledge Base and its Usage in Question Answering
ASCENT is a fully automated methodology for extracting and consolidating commonsense assertions from web contents (Nguyen et al., WWW 2021). It advances traditional triple-based commonsense knowledge representation by capturing semantic facets like locations and purposes, and composite concepts, i.e., subgroups and related aspects of subjects. In this demo, we present a web portal that allows users to understand its construction process, explore its content, and observe its impact in the use case of question answering. The demo website and an introductory video are both available online
The Life Cycle of Knowledge in Big Language Models: A Survey
Knowledge plays a critical role in artificial intelligence. Recently, the
extensive success of pre-trained language models (PLMs) has raised significant
attention about how knowledge can be acquired, maintained, updated and used by
language models. Despite the enormous amount of related studies, there still
lacks a unified view of how knowledge circulates within language models
throughout the learning, tuning, and application processes, which may prevent
us from further understanding the connections between current progress or
realizing existing limitations. In this survey, we revisit PLMs as
knowledge-based systems by dividing the life circle of knowledge in PLMs into
five critical periods, and investigating how knowledge circulates when it is
built, maintained and used. To this end, we systematically review existing
studies of each period of the knowledge life cycle, summarize the main
challenges and current limitations, and discuss future directions.Comment: paperlist: https://github.com/c-box/KnowledgeLifecycl
Probing Neural Language Models for Human Tacit Assumptions
Humans carry stereotypic tacit assumptions (STAs) (Prince,1978), or propositional beliefs about generic concepts. Suchassociations are crucial for understanding natural language.We construct a diagnostic set of word prediction prompts toevaluate whether recent neural contextualized language mod-els trained on large text corpora capture STAs. Our promptsare based on human responses in a psychological study of con-ceptual associations. We find models to be profoundly effec-tive at retrieving concepts given associated properties. Our re-sults demonstrate empirical evidence that stereotypic concep-tual representations are captured in neural models derived fromsemi-supervised linguistic exposure
LMentry: A Language Model Benchmark of Elementary Language Tasks
As the performance of large language models rapidly improves, benchmarks are
getting larger and more complex as well. We present LMentry, a benchmark that
avoids this "arms race" by focusing on a compact set of tasks that are trivial
to humans, e.g. writing a sentence containing a specific word, identifying
which words in a list belong to a specific category, or choosing which of two
words is longer. LMentry is specifically designed to provide quick and
interpretable insights into the capabilities and robustness of large language
models. Our experiments reveal a wide variety of failure cases that, while
immediately obvious to humans, pose a considerable challenge for large language
models, including OpenAI's latest 175B-parameter instruction-tuned model,
TextDavinci002. LMentry complements contemporary evaluation approaches of large
language models, providing a quick, automatic, and easy-to-run "unit test",
without resorting to large benchmark suites of complex tasks.Comment: 24 pages, 2 figure
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Generative models for open domain question answering have proven to be
competitive, without resorting to external knowledge. While promising, this
approach requires to use models with billions of parameters, which are
expensive to train and query. In this paper, we investigate how much these
models can benefit from retrieving text passages, potentially containing
evidence. We obtain state-of-the-art results on the Natural Questions and
TriviaQA open benchmarks. Interestingly, we observe that the performance of
this method significantly improves when increasing the number of retrieved
passages. This is evidence that generative models are good at aggregating and
combining evidence from multiple passages
- …