88 research outputs found
LELIE - An Intelligent Assistant for Improving Requirement Authoring
International audienceWhen writing or revising a set of requirements, or any technical document, it is particularly challenging to make sure that texts read easily and are unambiguous for any domain actor. Experience shows that even with several levels of proofreading and validation, most texts still contain a large number of language errors (lexical, grammatical, style, business, w.r.t. authoring recommendations), and lack of overall cohesion and coherence. LELIE [a] has been designed to track these errors and, whenever possible, to suggest corrections. LELIE has obviously an impact on the technical writer behavior: LELIE rapidly becomes an essential and user-friendly authoring companion
Korean Parsing Based on the Applicative Combinatory Categorial Grammar
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
Discourse structure analysis for requirement mining
International audienceIn this work, we first introduce two main approaches to writing requirements and then propose a method based on Natural Language Processing to improve requirement authoring and the overall coherence, cohesion and organization of requirement documents. We investigate the structure of requirement kernels, and then the discourse structure associated with those kernels. This will then enable the system to accurately extract requirements and their related contexts from texts (called requirement mining). Finally, we relate a first experimentation on requirement mining based on texts from seven companies. An evaluation that compares those results with manually annotated corpora of documents is given to conclude
Identification de termes flous et génériques dans la documentation technique : expérimentation avec l’analyse distributionnelle automatique
International audienceThis study takes place in the framework of the development of linguistic resources used by an automatic verification system of technical documents like specifications. Our objective is to enlarge semi-automatically the classes of intrinsically fuzzy terms along with generic terms in order to improve the steps of identifying ambiguous elements of the system such as factors of risk. We measure and compare the efficiency of the methods of automatic distributional analysis by considering obtained results from corpora of different sizes and specialization degrees by priming from a reduced list of prime terms. We show that if a corpus of too limited size is not usable, its automatic extension by similar documents produces results that can be completed by those obtained from distributional analysis on large generic corpora.Cette étude se place dans le cadre du développement des ressources linguistiques utilisées par un système de vérification automatique de documentations techniques comme les spécifications. Notre objectif est d'étendre semi-automatiquement des classes de termes intrinsèquement flous ainsi que des termes génériques afin d'améliorer le système de détection de passages ambigus reconnus comme des facteurs de risque. Nous mesurons et comparons l'efficacité de méthodes d'analyse distributionnelle automatiques en comparant les résultats obtenus sur des corpus de taille et de degré de spécialisation variables pour une liste réduite de termes amorces. Nous montrons que si un corpus de taille trop réduite est inutilisable, son extension automatique par des documents similaires donne des résultats complémentaires à ceux que produit l'analyse distributionnelle sur de gros corpus génériques
: Identification of fuzzy and underspecified terms in technical documents : an experiment with distributional semantics
International audienceThis study takes place in the framework of the development of linguistic resources used by an automatic verification system of technical documents like specifications. Our objective is to enlarge semi-automatically the classes of intrinsically fuzzy terms along with generic terms in order to improve the steps of identifying ambiguous elements of the system such as factors of risk. We measure and compare the efficiency of the methods of automatic distributional analysis by considering obtained results from corpora of different sizes and specialization degrees by priming from a reduced list of prime terms. We show that if a corpus of too limited size is not useable, its automatic extension by similar documents produces results that can be completed by those obtained from distributional analysis on large generic corpora.Cette étude se place dans le cadre du développement des ressources linguistiques utilisées par un système de vérification automatique de documentations techniques comme les spécifications. Notre objectif est d'étendre semi-automatiquement des classes de termes intrinsèquement flous ainsi que des termes génériques afin d'améliorer le système de détection de passages ambigus reconnus comme des facteurs de risque. Nous mesurons et comparons l'efficacité de méthodes d'analyse distributionnelle automatiques en comparant les résultats obtenus sur des corpus de taille et de degré de spécialisation variables pour une liste réduite de termes amorces. Nous montrons que si un corpus de taille trop réduite est inutilisable, son extension automatique par des documents similaires donne des résultats complémentaires à ceux que produit l'analyse distributionnelle sur de gros corpus génériques
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction
Many automated test generation techniques have been developed to aid
developers with writing tests. To facilitate full automation, most existing
techniques aim to either increase coverage, or generate exploratory inputs.
However, existing test generation techniques largely fall short of achieving
more semantic objectives, such as generating tests to reproduce a given bug
report. Reproducing bugs is nonetheless important, as our empirical study shows
that the number of tests added in open source repositories due to issues was
about 28% of the corresponding project test suite size. Meanwhile, due to the
difficulties of transforming the expected program semantics in bug reports into
test oracles, existing failure reproduction techniques tend to deal exclusively
with program crashes, a small subset of all bug reports. To automate test
generation from general bug reports, we propose LIBRO, a framework that uses
Large Language Models (LLMs), which have been shown to be capable of performing
code-related tasks. Since LLMs themselves cannot execute the target buggy code,
we focus on post-processing steps that help us discern when LLMs are effective,
and rank the produced tests according to their validity. Our evaluation of
LIBRO shows that, on the widely studied Defects4J benchmark, LIBRO can generate
failure reproducing test cases for 33% of all studied cases (251 out of 750),
while suggesting a bug reproducing test in first place for 149 bugs. To
mitigate data contamination, we also evaluate LIBRO against 31 bug reports
submitted after the collection of the LLM training data terminated: LIBRO
produces bug reproducing tests for 32% of the studied bug reports. Overall, our
results show LIBRO has the potential to significantly enhance developer
efficiency by automatically generating tests from bug reports.Comment: Accepted to IEEE/ACM International Conference on Software Engineering
2023 (ICSE 2023
Towards Autonomous Testing Agents via Conversational Large Language Models
Software testing is an important part of the development cycle, yet it
requires specialized expertise and substantial developer effort to adequately
test software. The recent discoveries of the capabilities of large language
models (LLMs) suggest that they can be used as automated testing assistants,
and thus provide helpful information and even drive the testing process. To
highlight the potential of this technology, we present a taxonomy of LLM-based
testing agents based on their level of autonomy, and describe how a greater
level of autonomy can benefit developers in practice. An example use of LLMs as
a testing assistant is provided to demonstrate how a conversational framework
for testing can help developers. This also highlights how the often criticized
hallucination of LLMs can be beneficial while testing. We identify other
tangible benefits that LLM-driven testing agents can bestow, and also discuss
some potential limitations
The GitHub Recent Bugs Dataset for Evaluating LLM-based Debugging Applications
Large Language Models (LLMs) have demonstrated strong natural language
processing and code synthesis capabilities, which has led to their rapid
adoption in software engineering applications. However, details about LLM
training data are often not made public, which has caused concern as to whether
existing bug benchmarks are included. In lieu of the training data for the
popular GPT models, we examine the training data of the open-source LLM
StarCoder, and find it likely that data from the widely used Defects4J
benchmark was included, raising the possibility of its inclusion in GPT
training data as well. This makes it difficult to tell how well LLM-based
results on Defects4J would generalize, as for any results it would be unclear
whether a technique's performance is due to LLM generalization or memorization.
To remedy this issue and facilitate continued research on LLM-based SE, we
present the GitHub Recent Bugs (GHRB) dataset, which includes 76 real-world
Java bugs that were gathered after the OpenAI data cut-off point
A clustering approach for detecting defects in technical documents
Requirements are usually “hand-written” and suffers from several problems like redundancy and inconsistency. The problems of redundancy and inconsistency between requirements or sets of requirements impact negatively the success of final products. Manually processing these issues requires too much time and it is very costly. The main contribution of this paper is the use of k-means algorithm for a redundancy and inconsistency detection in a new context, which is Requirements Engineering context. Also, we introduce a pre-processing step based on the Natural Language Processing (NLP) techniques to see the impact of this latter to the k-means results. We use Part-Of-Speech (POS) tagging and noun chunking to detect technical busi-ness terms associated to the requirements documents that we analyze. We experiment this approach on real industrial datasets. The results show the efficiency of the k-means clustering algorithm especially with the pre-processing
- …