12,440 research outputs found
Why and How to Extract Conditional Statements From Natural Language Requirements
Functional requirements often describe system behavior by relating events to each other, e.g. "If the system detects an error (e_1), an error message shall be shown (e_2)". Such conditionals consist of two parts: the antecedent (see e_1) and the consequent (e_2), which convey strong, semantic information about the intended behavior of a system. Automatically extracting conditionals from texts enables several analytical disciplines and is already used for information retrieval and question answering. We found that automated conditional extraction can also provide added value to Requirements Engineering (RE) by facilitating the automatic derivation of acceptance tests from requirements. However, the potential of extracting conditionals has not yet been leveraged for RE. We are convinced that this has two principal reasons:
1) The extent, form, and complexity of conditional statements in RE artifacts is not well understood. We do not know how conditionals are formulated and logically interpreted by RE practitioners. This hinders the development of suitable approaches for extracting conditionals from RE artifacts.
2) Existing methods fail to extract conditionals from Unrestricted Natural Language (NL) in fine-grained form. That is, they do not consider the combinatorics between antecedents and consequents. They also do not allow to split them into more fine-granular text fragments (e.g., variable and condition), rendering the extracted conditionals unsuitable for RE downstream tasks such as test case derivation.
This thesis contributes to both areas. In Part I, we present empirical results on the prevalence and logical interpretation of conditionals in RE artifacts. Our case study corroborates that conditionals are widely used in both traditional and agile requirements such as acceptance criteria. We found that conditionals in requirements mainly occur in explicit, marked form and may include up to three antecedents and two consequents. Hence, the extraction approach needs to understand conjunctions, disjunctions, and negations to fully capture the relation between antecedents and consequents. We also found that conditionals are a source of ambiguity and there is not just one way to interpret them formally. This affects any automated analysis that builds upon formalized requirements (e.g., inconsistency checking) and may also influence guidelines for writing requirements.
Part II presents our tool-supported approach CiRA capable of detecting conditionals in NL requirements and extracting them in fine-grained form. For the detection, CiRA uses syntactically enriched BERT embeddings combined with a softmax classifier and outperforms existing methods (macro-F_1: 82%). Our experiments show that a sigmoid classifier built on RoBERTa embeddings is best suited to extract conditionals in fine-grained form (macro-F_1: 86%). We disclose our code, data sets, and trained models to facilitate replication. CiRA is available at http://www.cira.bth.se/demo/.
In Part III, we highlight how the extraction of conditionals from requirements can help to create acceptance tests automatically. First, we motivate this use case in an empirical study and demonstrate that the lack of adequate acceptance tests is one of the major problems in agile testing. Second, we show how extracted conditionals can be mapped to a Cause-Effect-Graph from which test cases can be derived automatically. We demonstrate the feasibility of our approach in a case study with three industry partners. In our study, out of 578 manually created test cases, 71.8% can be generated automatically. Furthermore, our approach discovered 80 relevant test cases that were missed in manual test case design. At the end of this thesis, the reader will have an understanding of (1) the notion of conditionals in RE artifacts, (2) how to extract them in fine-grained form, and (3) the added value that the extraction of conditionals can provide to RE
Semantic Model Alignment for Business Process Integration
Business process models describe an enterprise’s way of conducting business and in this form the basis for shaping the organization and engineering the appropriate supporting or even enabling IT. Thereby, a major task in working with models is their analysis and comparison for the purpose of aligning them. As models can differ semantically not only concerning the modeling languages used, but even more so in the way in which the natural language for labeling the model elements has been applied, the correct identification of the intended meaning of a legacy model is a non-trivial task that thus far has only been solved by humans. In particular at the time of reorganizations, the set-up of B2B-collaborations or mergers and acquisitions the semantic analysis of models of different origin that need to be consolidated is a manual effort that is not only tedious and error-prone but also time consuming and costly and often even repetitive. For facilitating automation of this task by means of IT, in this thesis the new method of Semantic Model Alignment is presented. Its application enables to extract and formalize the semantics of models for relating them based on the modeling language used and determining similarities based on the natural language used in model element labels. The resulting alignment supports model-based semantic business process integration. The research conducted is based on a design-science oriented approach and the method developed has been created together with all its enabling artifacts. These results have been published as the research progressed and are presented here in this thesis based on a selection of peer reviewed publications comprehensively describing the various aspects
Recommended from our members
UK Research Information Shared Service (UKRISS) Final Report, July 2014
The reporting of research information is a complex and expensive activity for research organisations (ROs). There is little alignment between funders of the reporting requests made to institutions and requests made to individual researchers about their research outputs and outcomes. This inevitably results in duplication and increased costs across the sector, whilst limiting the potential sharing and reuse of the information. The UK Research Information Shared Service (UKRISS) project conducted a feasibility and scoping study for the reporting of research information at a national level based on CERIF (Common European Research Information Format), with the objective of increasing efficiency, productivity and quality across the sector. The aim was to define and prototype solutions which are compelling, easy to use, have a low entry barrier, and support innovative information sharing and benchmarking. CERIF has emerged as the preferred format for expressing research information across Europe. To date, CERIF has been piloted for specific applications, but not as a format for reporting requirements across all UK ROs. The final report presents the work carried out by the UKRISS project, including requirements gathering, modelling and prototyping, as well as recommendation for sustainability. UKRISS was divided into two phases. Phase 1, mapping the reporting landscape, ran from March 2012 to December 2012. Phase 2, exploring delivery of potential solutions, began in February 2013 and ended in December 2013
Challenges to knowledge representation in multilingual contexts
To meet the increasing demands of the complex inter-organizational processes and the demand for
continuous innovation and internationalization, it is evident that new forms of organisation are
being adopted, fostering more intensive collaboration processes and sharing of resources, in what
can be called collaborative networks (Camarinha-Matos, 2006:03). Information and knowledge are
crucial resources in collaborative networks, being their management fundamental processes to
optimize.
Knowledge organisation and collaboration systems are thus important instruments for the success of
collaborative networks of organisations having been researched in the last decade in the areas of
computer science, information science, management sciences, terminology and linguistics.
Nevertheless, research in this area didn’t give much attention to multilingual contexts of
collaboration, which pose specific and challenging problems. It is then clear that access to and
representation of knowledge will happen more and more on a multilingual setting which implies the
overcoming of difficulties inherent to the presence of multiple languages, through the use of
processes like localization of ontologies.
Although localization, like other processes that involve multilingualism, is a rather well-developed
practice and its methodologies and tools fruitfully employed by the language industry in the
development and adaptation of multilingual content, it has not yet been sufficiently explored as an
element of support to the development of knowledge representations - in particular ontologies -
expressed in more than one language. Multilingual knowledge representation is then an open
research area calling for cross-contributions from knowledge engineering, terminology, ontology
engineering, cognitive sciences, computational linguistics, natural language processing, and
management sciences.
This workshop joined researchers interested in multilingual knowledge representation, in a
multidisciplinary environment to debate the possibilities of cross-fertilization between knowledge
engineering, terminology, ontology engineering, cognitive sciences, computational linguistics,
natural language processing, and management sciences applied to contexts where multilingualism
continuously creates new and demanding challenges to current knowledge representation methods
and techniques.
In this workshop six papers dealing with different approaches to multilingual knowledge
representation are presented, most of them describing tools, approaches and results obtained in the
development of ongoing projects.
In the first case, Andrés Domínguez Burgos, Koen Kerremansa and Rita Temmerman present a
software module that is part of a workbench for terminological and ontological mining,
Termontospider, a wiki crawler that aims at optimally traverse Wikipedia in search of domainspecific
texts for extracting terminological and ontological information. The crawler is part of a tool
suite for automatically developing multilingual termontological databases, i.e. ontologicallyunderpinned
multilingual terminological databases. In this paper the authors describe the basic principles
behind the crawler and summarized the research setting in which the tool is currently tested.
In the second paper, Fumiko Kano presents a work comparing four feature-based similarity
measures derived from cognitive sciences. The purpose of the comparative analysis presented by the author is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain. For that, datasets based on standardized
pre-defined feature dimensions and values, which are obtainable from the UNESCO Institute for
Statistics (UIS) have been used for the comparative analysis of the similarity measures. The purpose
of the comparison is to verify the similarity measures based on the objectively developed datasets.
According to the author the results demonstrate that the Bayesian Model of Generalization provides
for the most effective cognitive model for identifying the most similar corresponding concepts
existing for a targeted socio-cultural community.
In another presentation, Thierry Declerck, Hans-Ulrich Krieger and Dagmar Gromann present an
ongoing work and propose an approach to automatic extraction of information from multilingual
financial Web resources, to provide candidate terms for building ontology elements or instances of
ontology concepts. The authors present a complementary approach to the direct
localization/translation of ontology labels, by acquiring terminologies through the access and
harvesting of multilingual Web presences of structured information providers in the field of finance,
leading to both the detection of candidate terms in various multilingual sources in the financial
domain that can be used not only as labels of ontology classes and properties but also for the
possible generation of (multilingual) domain ontologies themselves.
In the next paper, Manuel Silva, António Lucas Soares and Rute Costa claim that despite the
availability of tools, resources and techniques aimed at the construction of ontological artifacts,
developing a shared conceptualization of a given reality still raises questions about the principles
and methods that support the initial phases of conceptualization. These questions become, according
to the authors, more complex when the conceptualization occurs in a multilingual setting. To tackle
these issues the authors present a collaborative platform – conceptME - where terminological and
knowledge representation processes support domain experts throughout a conceptualization
framework, allowing the inclusion of multilingual data as a way to promote knowledge sharing and
enhance conceptualization and support a multilingual ontology specification.
In another presentation Frieda Steurs and Hendrik J. Kockaert present us TermWise, a large project
dealing with legal terminology and phraseology for the Belgian public services, i.e. the translation
office of the ministry of justice, a project which aims at developing an advanced tool including
expert knowledge in the algorithms that extract specialized language from textual data (legal
documents) and whose outcome is a knowledge database including Dutch/French equivalents for
legal concepts, enriched with the phraseology related to the terms under discussion.
Finally, Deborah Grbac, Luca Losito, Andrea Sada and Paolo Sirito report on the preliminary
results of a pilot project currently ongoing at UCSC Central Library, where they propose to adapt to
subject librarians, employed in large and multilingual Academic Institutions, the model used by
translators working within European Union Institutions. The authors are using User Experience
(UX) Analysis in order to provide subject librarians with a visual support, by means of “ontology
tables” depicting conceptual linking and connections of words with concepts presented according to
their semantic and linguistic meaning.
The organizers hope that the selection of papers presented here will be of interest to a broad audience, and will be a starting point for further discussion and cooperation
Business Capability Mining - Opportunities and Challenges
Business capability models are widely used in enterprise architecture management to generate an abstract overview of an organization’s business activities to reach its business objectives. The creation and maintenance of these models are associated with a huge manual workload. Research provides insights into opportunities for automated modeling of enterprise architecture models. However, most models address the application and technology layer and leave the business layer largely unexplored. Particularly, no research has been conducted on the automated generation of business capability models. This research paper uses 19 semi-structured expert interviews to identify possible automated modeling opportunities of business capabilities and related challenges and to jointly develop a business capability mining approach. This research benefit both, practice and research, by describing a situation-based business capability mining approach and identifying appropriate implementation scenarios
Automatic Concept Discovery from Parallel Text and Visual Corpora
Humans connect language and vision to perceive the world. How to build a
similar connection for computers? One possible way is via visual concepts,
which are text terms that relate to visually discriminative entities. We
propose an automatic visual concept discovery algorithm using parallel text and
visual corpora; it filters text terms based on the visual discriminative power
of the associated images, and groups them into concepts using visual and
semantic similarities. We illustrate the applications of the discovered
concepts using bidirectional image and sentence retrieval task and image
tagging task, and show that the discovered concepts not only outperform several
large sets of manually selected concepts significantly, but also achieves the
state-of-the-art performance in the retrieval task.Comment: To appear in ICCV 201
- …