18 research outputs found
Investigating ChatGPT's Potential to Assist in Requirements Elicitation Processes
Natural Language Processing (NLP) for Requirements Engineering (RE) (NLP4RE)
seeks to apply NLP tools, techniques, and resources to the RE process to
increase the quality of the requirements. There is little research involving
the utilization of Generative AI-based NLP tools and techniques for
requirements elicitation. In recent times, Large Language Models (LLM) like
ChatGPT have gained significant recognition due to their notably improved
performance in NLP tasks. To explore the potential of ChatGPT to assist in
requirements elicitation processes, we formulated six questions to elicit
requirements using ChatGPT. Using the same six questions, we conducted
interview-based surveys with five RE experts from academia and industry and
collected 30 responses containing requirements. The quality of these 36
responses (human-formulated + ChatGPT-generated) was evaluated over seven
different requirements quality attributes by another five RE experts through a
second round of interview-based surveys. In comparing the quality of
requirements generated by ChatGPT with those formulated by human experts, we
found that ChatGPT-generated requirements are highly Abstract, Atomic,
Consistent, Correct, and Understandable. Based on these results, we present the
most pressing issues related to LLMs and what future research should focus on
to leverage the emergent behaviour of LLMs more effectively in natural
language-based RE activities.Comment: Accepted at SEAA 2023. 8 pages, 5 figure
Supporting the Development of Cyber-Physical Systems with Natural Language Processing: A Report
Software has become the driving force for innovations in any technical system that observes the environment with different sensors and influence it by controlling a number of actuators; nowadays called Cyber-Physical System (CPS). The development of such systems is inherently inter-disciplinary and often contains a number of independent subsystems. Due to this diversity, the majority of development information is expressed in natural language artifacts of all kinds. In this paper, we report on recent results that our group has developed to support engineers of CPSs in working with the large amount of information expressed in natural language. We cover the topics of automatic knowledge extraction, expert systems, and automatic requirements classification. Furthermore, we envision that natural language processing will be a key component to connect requirements with simulation models and to explain tool-based decisions. We see both areas as promising for supporting engineers of CPSs in the future
Improving Requirements Completeness: Automated Assistance through Large Language Models
Natural language (NL) is arguably the most prevalent medium for expressing
systems and software requirements. Detecting incompleteness in NL requirements
is a major challenge. One approach to identify incompleteness is to compare
requirements with external sources. Given the rise of large language models
(LLMs), an interesting question arises: Are LLMs useful external sources of
knowledge for detecting potential incompleteness in NL requirements? This
article explores this question by utilizing BERT. Specifically, we employ
BERT's masked language model (MLM) to generate contextualized predictions for
filling masked slots in requirements. To simulate incompleteness, we withhold
content from the requirements and assess BERT's ability to predict terminology
that is present in the withheld content but absent in the disclosed content.
BERT can produce multiple predictions per mask. Our first contribution is
determining the optimal number of predictions per mask, striking a balance
between effectively identifying omissions in requirements and mitigating noise
present in the predictions. Our second contribution involves designing a
machine learning-based filter to post-process BERT's predictions and further
reduce noise. We conduct an empirical evaluation using 40 requirements
specifications from the PURE dataset. Our findings indicate that: (1) BERT's
predictions effectively highlight terminology that is missing from
requirements, (2) BERT outperforms simpler baselines in identifying relevant
yet missing terminology, and (3) our filter significantly reduces noise in the
predictions, enhancing BERT's effectiveness as a tool for completeness checking
of requirements.Comment: Submitted to Requirements Engineering Journal (REJ) - REFSQ'23
Special Issue. arXiv admin note: substantial text overlap with
arXiv:2302.0479
Knowledge Extraction from Natural Language Requirements into a Semantic Relation Graph
Knowledge extraction and representation aims to identify information and to transform it into a machine-readable format. Knowledge representations support Information Retrieval tasks such as searching for single statements, documents, or metadata.
Requirements specifications of complex systems such as automotive software systems are usually divided into different subsystem specifications. Nevertheless, there are semantic relations between individual documents of the separated subsystems, which have to be considered in further processes (e.g. dependencies). If requirements engineers or other developers are not aware of these relations, this can lead to inconsistencies or malfunctions of the overall system. Therefore, there is a strong need for tool support in order to detects semantic relations in a set of large natural language requirements specifications.
In this work we present a knowledge extraction approach based on an explicit knowledge representation of the content of natural language requirements as a semantic relation graph. Our approach is fully automated and includes an NLP pipeline to transform unrestricted natural language requirements into a graph. We split the natural language into different parts and relate them to each other based on their semantic relation. In addition to semantic relations, other relationships can also be included in the graph. We envision to use a semantic search algorithm like spreading activation to allow users to search different semantic relations in the graph
Automated Handling of Anaphoric Ambiguity in Requirements: A Multi-solution Study
Ambiguity is a pervasive issue in natural-language requirements. A common source of ambiguity in requirements is when a pronoun is anaphoric. In requirements engineering, anaphoric ambiguity occurs when a pronoun can plausibly refer to different entities and thus be interpreted differently by different readers. In this paper, we develop an accurate and practical automated approach for handling anaphoric ambiguity in requirements, addressing both ambiguity detection and anaphora interpretation. In view of the multiple competing natural language processing (NLP) and machine learning (ML) technologies that one can utilize, we simultaneously pursue six alternative solutions, empirically assessing each using a collection of ~1,350 industrial requirements. The alternative solution strategies that we consider are natural choices induced by the existing technologies; these choices frequently arise in other automation tasks involving natural-language requirements. A side-by-side empirical examination of these choices helps develop insights about the usefulness of different state-of-the-art NLP and ML technologies for addressing requirements engineering problems. For the ambiguity detection task, we observe that supervised ML outperforms both a large-scale language model, SpanBERT (a variant of BERT), as well as a solution assembled from off-the-shelf NLP coreference resolvers. In contrast, for anaphora interpretation, SpanBERT yields the most accurate solution. In our evaluation, (1) the best solution for anaphoric ambiguity detection has an average precision of ~60% and a recall of 100%, and (2) the best solution for anaphora interpretation (resolution) has an average success rate of ~98%
NLG4RE:How NL generation can support validation in RE
Context and motivation: All too frequently functional requirements (FRs) for a (software) system are unclear. Written in natural language, FRs are underspecified for software developers; when written in formal language, FRs are insufficiently comprehensible for users. This is a well-known problem in RE. As long as this either/or dichotomy exists, FRs cannot be a âbasis for common agreement among all parties involvedâ, as Barry Boehm puts it.Question/problem: On the one hand, FRs should unambiguously specify the functional behaviour of the system to be written or adapted, and on the other hand be fully understandable by the customer that must agree with them. What is required to achieve this goal?Principal ideas/results: A specification must describe the Statics as well as the Dynamics. In our approach it consists of a Conceptual Data Model (the data structure, i.e., the Statics) plus a set of System Sequence Descriptions (SSDs) representing the processes (i.e., the Dynamics). SSDs schematically depict the interactions between the primary actor (user), the system (as a black box), and other actors (if any), including the messages between them. We provide a set of rules to generate natural language expressions from both the ConceptualData Model and the SSDs that are understandable by the user (âInformalisation of formal requirementsâ). Generating understandable representations of a specification is relevant for requirements validation tasks.Contribution to validation: We introduce a form of Natural Language Generation (the NLG in the title) by defining a grammar and mapping rules to precise and unambiguous expressions in natural language, in order to improve understandability of the FRs and the data model by the user community