104 research outputs found
Using Genetic Programming to Build Self-Adaptivity into Software-Defined Networks
Self-adaptation solutions need to periodically monitor, reason about, and
adapt a running system. The adaptation step involves generating an adaptation
strategy and applying it to the running system whenever an anomaly arises. In
this article, we argue that, rather than generating individual adaptation
strategies, the goal should be to adapt the control logic of the running system
in such a way that the system itself would learn how to steer clear of future
anomalies, without triggering self-adaptation too frequently. While the need
for adaptation is never eliminated, especially noting the uncertain and
evolving environment of complex systems, reducing the frequency of adaptation
interventions is advantageous for various reasons, e.g., to increase
performance and to make a running system more robust. We instantiate and
empirically examine the above idea for software-defined networking -- a key
enabling technology for modern data centres and Internet of Things
applications. Using genetic programming,(GP), we propose a self-adaptation
solution that continuously learns and updates the control constructs in the
data-forwarding logic of a software-defined network. Our evaluation, performed
using open-source synthetic and industrial data, indicates that, compared to a
baseline adaptation technique that attempts to generate individual adaptations,
our GP-based approach is more effective in resolving network congestion, and
further, reduces the frequency of adaptation interventions over time. In
addition, we show that, for networks with the same topology, reusing over
larger networks the knowledge that is learned on smaller networks leads to
significant improvements in the performance of our GP-based adaptation
approach. Finally, we compare our approach against a standard data-forwarding
algorithm from the network literature, demonstrating that our approach
significantly reduces packet loss.Comment: arXiv admin note: text overlap with arXiv:2205.0435
Improving Requirements Completeness: Automated Assistance through Large Language Models
Natural language (NL) is arguably the most prevalent medium for expressing
systems and software requirements. Detecting incompleteness in NL requirements
is a major challenge. One approach to identify incompleteness is to compare
requirements with external sources. Given the rise of large language models
(LLMs), an interesting question arises: Are LLMs useful external sources of
knowledge for detecting potential incompleteness in NL requirements? This
article explores this question by utilizing BERT. Specifically, we employ
BERT's masked language model (MLM) to generate contextualized predictions for
filling masked slots in requirements. To simulate incompleteness, we withhold
content from the requirements and assess BERT's ability to predict terminology
that is present in the withheld content but absent in the disclosed content.
BERT can produce multiple predictions per mask. Our first contribution is
determining the optimal number of predictions per mask, striking a balance
between effectively identifying omissions in requirements and mitigating noise
present in the predictions. Our second contribution involves designing a
machine learning-based filter to post-process BERT's predictions and further
reduce noise. We conduct an empirical evaluation using 40 requirements
specifications from the PURE dataset. Our findings indicate that: (1) BERT's
predictions effectively highlight terminology that is missing from
requirements, (2) BERT outperforms simpler baselines in identifying relevant
yet missing terminology, and (3) our filter significantly reduces noise in the
predictions, enhancing BERT's effectiveness as a tool for completeness checking
of requirements.Comment: Submitted to Requirements Engineering Journal (REJ) - REFSQ'23
Special Issue. arXiv admin note: substantial text overlap with
arXiv:2302.0479
Early Verification of Legal Compliance via Bounded Satisfiability Checking
Legal properties involve reasoning about data values and time. Metric
first-order temporal logic (MFOTL) provides a rich formalism for specifying
legal properties. While MFOTL has been successfully used for verifying legal
properties over operational systems via runtime monitoring, no solution exists
for MFOTL-based verification in early-stage system development captured by
requirements. Given a legal property and system requirements, both formalized
in MFOTL, the compliance of the property can be verified on the requirements
via satisfiability checking. In this paper, we propose a practical, sound, and
complete (within a given bound) satisfiability checking approach for MFOTL. The
approach, based on satisfiability modulo theories (SMT), employs a
counterexample-guided strategy to incrementally search for a satisfying
solution. We implemented our approach using the Z3 SMT solver and evaluated it
on five case studies spanning the healthcare, business administration, banking
and aviation domains. Our results indicate that our approach can efficiently
determine whether legal properties of interest are met, or generate
counterexamples that lead to compliance violations
Learning Non-robustness using Simulation-based Testing: a Network Traffic-shaping Case Study
An input to a system reveals a non-robust behaviour when, by making a small
change in the input, the output of the system changes from acceptable (passing)
to unacceptable (failing) or vice versa. Identifying inputs that lead to
non-robust behaviours is important for many types of systems, e.g.,
cyber-physical and network systems, whose inputs are prone to perturbations. In
this paper, we propose an approach that combines simulation-based testing with
regression tree models to generate value ranges for inputs in response to which
a system is likely to exhibit non-robust behaviours. We apply our approach to a
network traffic-shaping system (NTSS) -- a novel case study from the network
domain. In this case study, developed and conducted in collaboration with a
network solutions provider, RabbitRun Technologies, input ranges that lead to
non-robustness are of interest as a way to identify and mitigate network
quality-of-service issues. We demonstrate that our approach accurately
characterizes non-robust test inputs of NTSS by achieving a precision of 84%
and a recall of 100%, significantly outperforming a standard baseline. In
addition, we show that there is no statistically significant difference between
the results obtained from our simulated testbed and a hardware testbed with
identical configurations. Finally we describe lessons learned from our
industrial collaboration, offering insights about how simulation helps discover
unknown and undocumented behaviours as well as a new perspective on using
non-robustness as a measure for system re-configuration.Comment: This paper is accepted at the 16th IEEE International Conference on
Software Testing, Verification and Validation (ICST 2023
AI-enabled Automation for Completeness Checking of Privacy Policies
Technological advances in information sharing have raised concerns about data
protection. Privacy policies contain privacy-related requirements about how the
personal data of individuals will be handled by an organization or a software
system (e.g., a web service or an app). In Europe, privacy policies are subject
to compliance with the General Data Protection Regulation (GDPR). A
prerequisite for GDPR compliance checking is to verify whether the content of a
privacy policy is complete according to the provisions of GDPR. Incomplete
privacy policies might result in large fines on violating organization as well
as incomplete privacy-related software specifications. Manual completeness
checking is both time-consuming and error-prone. In this paper, we propose
AI-based automation for the completeness checking of privacy policies. Through
systematic qualitative methods, we first build two artifacts to characterize
the privacy-related provisions of GDPR, namely a conceptual model and a set of
completeness criteria. Then, we develop an automated solution on top of these
artifacts by leveraging a combination of natural language processing and
supervised machine learning. Specifically, we identify the GDPR-relevant
information content in privacy policies and subsequently check them against the
completeness criteria. To evaluate our approach, we collected 234 real privacy
policies from the fund industry. Over a set of 48 unseen privacy policies, our
approach detected 300 of the total of 334 violations of some completeness
criteria correctly, while producing 23 false positives. The approach thus has a
precision of 92.9% and recall of 89.8%. Compared to a baseline that applies
keyword search only, our approach results in an improvement of 24.5% in
precision and 38% in recall
Synthetic Data Generation for Statistical Testing
Usage-based statistical testing employs knowledge about the actual or anticipated usage profile of the system under test for estimating system reliability. For many systems, usage-based statistical testing involves generating synthetic test data. Such data must possess the same statistical characteristics as the actual data that the system will process during operation. Synthetic test data must further satisfy any logical validity constraints that the actual data is subject to. Targeting data-intensive systems, we propose an approach for generating synthetic test data that is both statistically representative and logically valid. The approach works by first generating a data sample that meets the desired statistical characteristics, without taking into account the logical constraints. Subsequently, the approach tweaks the generated sample to fix any logical constraint violations. The tweaking process is iterative and continuously guided toward achieving the desired statistical characteristics. We report on a realistic evaluation of the approach, where we generate a synthetic population of citizens' records for testing a public administration IT system. Results suggest that our approach is scalable and capable
WikiDoMiner: Wikipedia Domain-Specific Miner
We introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by first extracting a set of domain-specific keywords from a given RS, and then querying Wikipedia for these keywords. The output of WikiDoMiner is a set of Wikipedia articles relevant to the domain of the input RS. Mining Wikipedia for domain-specific knowledge can be beneficial for multiple requirements engineering tasks, e.g., ambiguity handling, requirements classification, and question answering. WikiDoMiner is publicly available on Zenodo under an open-source license (https://doi.org/10.5281/zenodo.6672682
- …