82 research outputs found
Deanthropomorphising NLP: Can a Language Model Be Conscious?
This work is intended as a voice in the discussion over the recent claims
that LaMDA, a pretrained language model based on the Transformer model
architecture, is sentient. This claim, if confirmed, would have serious
ramifications in the Natural Language Processing (NLP) community due to
wide-spread use of similar models. However, here we take the position that such
a language model cannot be sentient, or conscious, and that LaMDA in particular
exhibits no advances over other similar models that would qualify it. We
justify this by analysing the Transformer architecture through Integrated
Information Theory. We see the claims of consciousness as part of a wider
tendency to use anthropomorphic language in NLP reporting. Regardless of the
veracity of the claims, we consider this an opportune moment to take stock of
progress in language modelling and consider the ethical implications of the
task. In order to make this work helpful for readers outside the NLP community,
we also present the necessary background in language modelling
Towards a corpus for credibility assessment in software practitioner blog articles
Blogs are a source of grey literature which are widely adopted by software
practitioners for disseminating opinion and experience. Analysing such articles
can provide useful insights into the state-of-practice for software engineering
research. However, there are challenges in identifying higher quality content
from the large quantity of articles available. Credibility assessment can help
in identifying quality content, though there is a lack of existing corpora.
Credibility is typically measured through a series of conceptual criteria, with
'argumentation' and 'evidence' being two important criteria.
We create a corpus labelled for argumentation and evidence that can aid the
credibility community. The corpus consists of articles from the blog of a
single software practitioner and is publicly available.
Three annotators label the corpus with a series of conceptual credibility
criteria, reaching an agreement of 0.82 (Fleiss' Kappa). We present preliminary
analysis of the corpus by using it to investigate the identification of claim
sentences (one of our ten labels).
We train four systems (Bert, KNN, Decision Tree and SVM) using three feature
sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of
0.64 using InferSent and a Linear SVM.
Our preliminary results are promising, indicating that the corpus can help
future studies in detecting the credibility of grey literature. Future research
will investigate the degree to which the sentence level annotations can infer
the credibility of the overall document
MultiLS: A Multi-task Lexical Simplification Framework
Lexical Simplification (LS) automatically replaces difficult to read words
for easier alternatives while preserving a sentence's original meaning. LS is a
precursor to Text Simplification with the aim of improving text accessibility
to various target demographics, including children, second language learners,
individuals with reading disabilities or low literacy. Several datasets exist
for LS. These LS datasets specialize on one or two sub-tasks within the LS
pipeline. However, as of this moment, no single LS dataset has been developed
that covers all LS sub-tasks. We present MultiLS, the first LS framework that
allows for the creation of a multi-task LS dataset. We also present MultiLS-PT,
the first dataset to be created using the MultiLS framework. We demonstrate the
potential of MultiLS-PT by carrying out all LS sub-tasks of (1). lexical
complexity prediction (LCP), (2). substitute generation, and (3). substitute
ranking for Portuguese. Model performances are reported, ranging from
transformer-based models to more recent large language models (LLMs)
Extending a corpus for assessing the credibility of software practitioner blog articles using meta-knowledge
Practitioner written grey literature, such as blog articles, has value in software engineering research. Such articles provide insight into practice that is often not visible to research. However, a high quantity and varying quality are two major challenges in utilising such material. Quality is defined as an aggregate of a document's relevance to the consumer and its credibility. Credibility is often assessed through a series of conceptual criteria that are specific to a particular user group. For researchers, previous work has found argumentation' and >evidence' to be two important criteria. In this paper, we extend a previously developed corpus by annotating at broader granularity. We then investigate whether the original annotations (sentence level) can infer these new annotations (article level). Our preliminary results show that sentence-level annotations infer the overall credibility of an article with an F1 score of 91%. These results indicate that the corpus can help future studies in detecting the credibility of practitioner written grey literature
Using NLP to quantify the environmental cost and diversity benefits of in-person NLP conferences
The environmental costs of research are progressively important to the NLP community and their associated challenges are increasingly debated. In this work, we analyse the carbon cost (measured as CO2-equivalent) associated with journeys made by researchers attending in-person NLP conferences. We obtain the necessary data by text-mining all publications from the ACL anthology available at the time of the study (n=60,572) and extracting information about an author's affiliation, including their address. This allows us to estimate the corresponding carbon cost and compare it to previously known values for training large models. Further, we look at the benefits of in-person conferences by demonstrating that they can increase participation diversity by encouraging attendance from the region surrounding the host country. We show how the trade-off between carbon cost and diversity of an event depends on its location and type. Our aim is to foster further discussion on the best way to address the joint issue of emissions and diversity in the future
Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table
Clinical letters are infamously impenetrable for the lay patient. This work uses neural text simplification methods to automatically improve the understandability of clinical let- ters for patients. We take existing neural text simplification software and augment it with a new phrase table that links complex medi- cal terminology to simpler vocabulary by min- ing SNOMED-CT. In an evaluation task us- ing crowdsourcing, we show that the results of our new system are ranked easier to under- stand (average rank 1.93) than using the origi- nal system (2.34) without our phrase table. We also show improvement against baselines in- cluding the original text (2.79) and using the phrase table without the neural text simplifica- tion software (2.94). Our methods can easily be transferred outside of the clinical domain by using domain-appropriate resources to pro- vide effective neural text simplification for any domain without the need for costly annotation
How do control tokens affect natural language generation tasks like text simplification
Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications
Predicting lexical complexity in English texts: the Complex 2.0 dataset
© 2022 The Authors. Published by Springer. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1007/s10579-022-09588-2Identifying words which may cause difficulty for a reader is an essential step in most lexical text simplification systems prior to lexical substitution and can also be used for assessing the readability of a text. This task is commonly referred to as complex word identification (CWI) and is often modelled as a supervised classification problem. For training such systems, annotated datasets in which words and sometimes multi-word expressions are labelled regarding complexity are required. In this paper we analyze previous work carried out in this task and investigate the properties of CWI datasets for English. We develop a protocol for the annotation of lexical complexity and use this to annotate a new dataset, CompLex 2.0. We present experiments using both new and old datasets to investigate the nature of lexical complexity. We found that a Likert-scale annotation protocol provides an objective setting that is superior for identifying the complexity of words compared to a binary annotation protocol. We release a new dataset using our new protocol to promote the task of Lexical Complexity Prediction
- …