39 research outputs found

    Software development process mining: discovery, conformance checking and enhancement

    Get PDF
    Context. Modern software projects require the proper allocation of human, technical and financial resources. Very often, project managers make decisions supported only by their personal experience, intuition or simply by mirroring activities performed by others in similar contexts. Most attempts to avoid such practices use models based on lines of code, cyclomatic complexity or effort estimators, thus commonly supported by software repositories which are known to contain several flaws. Objective. Demonstrate the usefulness of process data and mining methods to enhance the software development practices, by assessing efficiency and unveil unknown process insights, thus contributing to the creation of novel models within the software development analytics realm. Method. We mined the development process fragments of multiple developers in three different scenarios by collecting Integrated Development Environment (IDE) events during their development sessions. Furthermore, we used process and text mining to discovery developers’ workflows and their fingerprints, respectively. Results. We discovered and modeled with good quality developers’ processes during programming sessions based on events extracted from their IDEs. We unveiled insights from coding practices in distinct refactoring tasks, built accurate software complexity forecast models based only on process metrics and setup a method for characterizing coherently developers’ behaviors. The latter may ultimately lead to the creation of a catalog of software development process smells. Conclusions. Our approach is agnostic to programming languages, geographic location or development practices, making it suitable for challenging contexts such as in modern global software development projects using either traditional IDEs or sophisticated low/no code platforms.Contexto. Projetos de software modernos requerem a correta alocação de recursos humanos, técnicos e financeiros. Frequentemente, os gestores de projeto tomam decisões suportadas apenas na sua própria experiência, intuição ou simplesmente espelhando atividades executadas por terceiros em contextos similares. As tentativas para evitar tais práticas baseiam-se em modelos que usam linhas de código, a complexidade ciclomática ou em estimativas de esforço, sendo estes tradicionalmente suportados por repositórios de software conhecidos por conterem várias limitações. Objetivo. Demonstrar a utilidade dos dados de processo e respetivos métodos de análise na melhoria das práticas de desenvolvimento de software, colocando o foco na análise da eficiência e revelando aspetos dos processos até então desconhecidos, contribuindo para a criação de novos modelos no contexto de análises avançadas para o desenvolvimento de software. Método. Explorámos os fragmentos de processo de vários programadores em três cenários diferentes, recolhendo eventos durante as suas sessões de desenvolvimento no IDE. Adicionalmente, usámos métodos de descoberta e análise de processos e texto no sentido de modelar o fluxo de trabalho dos programadores e as suas características individuais, respetivamente. Resultados. Descobrimos e modelámos com boa qualidade os processos dos programadores durante as suas sessões de trabalho, usando eventos provenientes dos seus IDEs. Revelámos factos desconhecidos sobre práticas de refabricação, construímos modelos de previsão da complexidade ciclomática usando apenas métricas de processo e criámos um método para caracterizar coerentemente os comportamentos dos programadores. Este último, pode levar à criação de um catálogo de boas/más práticas no processo de desenvolvimento de software. Conclusões. A nossa abordagem é agnóstica em termos de linguagens de programação, localização geográfica ou prática de desenvolvimento, tornando-a aplicável em contextos complexos tal como em projetos modernos de desenvolvimento global que utilizam tanto os IDEs tradicionais como as atuais e sofisticadas plataformas "low/no code"

    Evolution, testing and configuration of variability intensive systems

    Get PDF
    Tesis descargada desde ResearchGateOne of the key characteristics of software is its ability to be adapted and configured to different scenarios. Recently, software variability has been studied as a first-class concept in different domains ranging from software product lines to pervasive systems. Variability is the ability of a software product to vary depending on different circumstances. Variability intensive systems are those software products where variability management is a core engineering activity. The varying parts of those systems are commonly modeled by us- ing different variability model flavors, being feature modeling one of the most common ones. Feature models were first introduced by Kang et al. back in 1990 and are a compact representation of a set of configurations in a variability intensive system. The large number of configurations that a feature model can encode makes the manual analysis of feature models an error prone and costly task. Then, computer-aided mechanisms appeared as a solution to extract useful information from feature models. This process of extracting information from feature models is known as ¿Automated Analysis of Feature models¿ that has been one of the main areas of research in the last years where more than thirty analysis operations have been proposed.Premio Extraordinario de Doctorado U

    Modelling, Reverse Engineering, and Learning Software Variability

    Get PDF
    The society expects software to deliver the right functionality, in a short amount of time and with fewer resources, in every possible circumstance whatever are the hardware, the operating systems, the compilers, or the data fed as input. For fitting such a diversity of needs, it is common that software comes in many variants and is highly configurable through configuration options, runtime parameters, conditional compilation directives, menu preferences, configuration files, plugins, etc. As there is no one-size-fits-all solution, software variability ("the ability of a software system or artifact to be efficiently extended, changed, customized or configured for use in a particular context") has been studied the last two decades and is a discipline of its own. Though highly desirable, software variability also introduces an enormous complexity due to the combinatorial explosion of possible variants. For example, the Linux kernel has 15000+ options and most of them can have 3 values: "yes", "no", or "module". Variability is challenging for maintaining, verifying, and configuring software systems (Web applications, Web browsers, video tools, etc.). It is also a source of opportunities to better understand a domain, create reusable artefacts, deploy performance-wise optimal systems, or find specialized solutions to many kinds of problems. In many scenarios, a model of variability is either beneficial or mandatory to explore, observe, and reason about the space of possible variants. For instance, without a variability model, it is impossible to establish a sampling strategy that would satisfy the constraints among options and meet coverage or testing criteria. I address a central question in this HDR manuscript: How to model software variability? I detail several contributions related to modelling, reverse engineering, and learning software variability. I first contribute to support the persons in charge of manually specifying feature models, the de facto standard for modeling variability. I develop an algebra together with a language for supporting the composition, decomposition, diff, refactoring, and reasoning of feature models. I further establish the syntactic and semantic relationships between feature models and product comparison matrices, a large class of tabular data. I then empirically investigate how these feature models can be used to test in the large configurable systems with different sampling strategies. Along this effort, I report on the attempts and lessons learned when defining the "right" variability language. From a reverse engineering perspective, I contribute to synthesize variability information into models and from various kinds of artefacts. I develop foundations and methods for reverse engineering feature models from satisfiability formulae, product comparison matrices, dependencies files and architectural information, and from Web configurators. I also report on the degree of automation and show that the involvement of developers and domain experts is beneficial to obtain high-quality models. Thirdly, I contribute to learning constraints and non-functional properties (performance) of a variability-intensive system. I describe a systematic process "sampling, measuring, learning" that aims to enforce or augment a variability model, capturing variability knowledge that domain experts can hardly express. I show that supervised, statistical machine learning can be used to synthesize rules or build prediction models in an accurate and interpretable way. This process can even be applied to huge configuration space, such as the Linux kernel one. Despite a wide applicability and observed benefits, I show that each individual line of contributions has limitations. I defend the following answer: a supervised, iterative process (1) based on the combination of reverse engineering, modelling, and learning techniques; (2) capable of integrating multiple variability information (eg expert knowledge, legacy artefacts, dynamic observations). Finally, this work opens different perspectives related to so-called deep software variability, security, smart build of configurations, and (threats to) science

    Community-based co-design of a crowdsourcing task management application for safeguarding indigenous knowledge

    Get PDF
    Teaching indigenous knowledge (IK) to African youth has become more complicated due to a variety of reasons such as urban migration, loss of interest in it, the dominance of scientific knowledge and the technological revolution. Therefore, there is a considerable movement towards using technologies to safeguard IK before it becomes obsolete. It is noteworthy that research conducted and software development perspectives being used are mainly based on Western worldviews that are inappropriate for African socio-cultural contexts. IK holders are often not in charge of the digitisation process and merely treated as subjects. In this study, we explored a suitable development approach of a crowdsourcing task management application (TMA) as an auxiliary tool for safeguarding IK. Moreover, the study sought to provide an opportunity for the indigenous communities to make requests of three-dimensional (3D) models of their traditional objects independently. The delivered traditional 3D models are imported into the communities' IK visualisation tools used by the IK holders to teach the youth about their cultural heritage. The main objective of this study was to ascertain how the indigenous rural communities could appropriate a foreign technological concept such as crowdsourcing. This brought about our first research theme: investigating the necessary conditions to establish and maintain beneficial embedded community engagement. The second theme was to determine the suitable methods for technology co-design. Thirdly, to discover what does the communities' appropriated crowdsourcing concept entail. We applied a consolidated research method based on Community-based CoDesign (CBCD) extended with Afrocentric research insights and operationalised with Action Research cycle principles of planning, action and reflection. CBCD was conducted in three cycles with Otjiherero speaking indigenous rural communities from Namibia. Reflections from the first cycle revealed that the rural communities would require unique features in their crowdsourcing application. During the second cycle of co-designing with the ovaHimba community, we learnt that CBCD is matured through mutual trust, reciprocity and skills transfer and deconstructing mainstream technologies to spark co-design ideas. Lastly, in our third cycle of CBCD, we showcased that communities of similar cultures and knowledge construction had common ideas of co-designing the TMA. We also simulated that the construction of traditional 3D models requires indigenous communities to provide insight details of the traditional object to minimise unsatisfactory deliverables. The findings of this study are contributing in two areas (1) research approach and (2) appropriation of technology. We provide a synthesis of Oundu moral values and Afrocentricity as a foundation for conducting Afrocentric research to establish and maintain humanness before CBCD can take place. With those taken as inherent moral values, Afrocentricity should then solely be focused on knowledge construction within an African epistemology. For the appropriation of technology, we share codesign techniques on how the indigenous rural communities appropriated the mainstream crowdsourcing concept through local meaning-making. CBCD researchers should incorporate Afrocentricity for mutual learning, knowledge construction, and sharing for the benefit of all

    Online experimentation in automotive software engineering

    Get PDF
    Context: Online experimentation has long been the gold standard for evaluating software towards the actual needs and preferences of customers. In the Software-as-a-Service domain, various online experimentation techniques are applied and proven successful. As software is becoming the main differentiator for automotive products, the automotive sector has started to express an interest in adopting online experimentation to strengthen their software development process. Objective: In this research, we aim to systematically address the challenges in adopting online experimentation in the automotive domain.Method: We apply a multidisciplinary approach to this research. To understand the state-of-practise in online experimentation in the industry, we conduct case studies with three manufacturers. We introduce our experimental design and evaluation methods to real vehicles driven by customers at scale. Moreover, we run experiments to quantitatively evaluate experiment design and causal inference models. Results: Four main research outcomes are presented in this thesis. First, we propose an architecture for continuous online experimentation given the limitations experienced in the automotive domain. Second, after identifying an inherent limitation of sample sizes in the automotive domain, we apply and evaluate an experimentation design method. The method allows us to utilise pre-experimental data for generating balanced groups even when sample sizes are limited. Third, we present an alternative approach to randomised experiments and demonstrate the application of Bayesian causal inference in online software evaluation. With the models, we enable software online evaluation without the need for a fully randomised experiment. Finally, we relate the formal assumption in the Bayesian causal models to the implications in practise, and we demonstrate the inference models with cases from the automotive domain. Outlook: In our future work, we plan to explore causal structural and graphical models applied in software engineering, and demonstrate the application of causal discovery in machine learning-based autonomous drive software

    Improving Software Model Inference by Combining State Merging and Markov Models

    Get PDF
    Labelled-transition systems (LTS) are widely used by developers and testers to model software systems in terms of their sequential behaviour. They provide an overview of the behaviour of the system and their reaction to different inputs. LTS models are the foundation for various automated verification techniques such as model-checking and model-based testing. These techniques require up-to-date models to be meaningful. Unfortunately, software models are rare in practice. Due to the effort and time required to build these models manually, a software engineer would want to infer them automatically from traces (sequences of events or function calls). Many techniques have focused on inferring LTS models from given traces of system execution, where these traces are produced by running a system on a series of tests. State-merging is the foundation of some of the most successful LTS inference techniques to construct LTS models. Passive inference approaches such as k-tail and Evidence-Driven State Merging (EDSM) can infer LTS models from these traces. Moreover, the best-performing methods of inferring LTS models rely on the availability of negatives, i.e. traces that are not permitted from specific states and such information is not usually available. The long-standing challenge for such inference approaches is constructing models well from very few traces and without negatives. Active inference techniques such as Query-driven State Merging (QSM) can learn LTSs from traces by asking queries as tests to a system being learnt. It may lead to infer inaccurate LTSs since the performance of QSM relies on the availability of traces. The challenge for such inference approaches is inferring LTSs well from very few traces and with fewer queries asked. In this thesis, investigations of the existing techniques are presented to the challenge of inferring LTS models from few positive traces. These techniques fail to find correct LTS models in cases of insufficient training data. This thesis focuses on finding better solutions to this problem by using evidence obtained from the Markov models to bias the EDSM learner towards merging states that are more likely to correspond to the same state in a model. Markov models are used to capture the dependencies between event sequences in the collected traces. Those dependencies rely on whether elements of event permitted or prohibited to follow short sequences appear in the traces. This thesis proposed EDSM-Markov a passive inference technique that aimed to improve the existing ones in the absence of negative traces and to prevent the over-generalization problem. In this thesis, improvements obtained by the proposed learners are demonstrated by a series of experiments using randomly-generated labelled-transition systems and case studies. The results obtained from the conducted experiments showed that EDSM-Markov can infer better LTSs compared to other techniques. This thesis also proposes modifications to the QSM learner to improve the accuracy of the inferred LTSs. This results in a new learner, which is named ModifiedQSM. This includes considering more tests to the system being inferred in order to avoid the over-generalization problem. It includes investigations of using Markov models to reduce the number of queries consumed by the ModifiedQSM learner. Hence, this thesis introduces a new LTS inference technique, which is called MarkovQSM. Moreover, enhancements of LTSs inferred by ModifiedQSM and MarkovQSM learners are demonstrated by a series of experiments. The results from the experiments demonstrate that ModifiedQSM can infer better LTSs compared to other techniques. Moreover, MarkovQSM has proven to significantly reduce the number of membership queries consumed compared to ModifiedQSM with a very small loss of accuracy

    Interaction and communication among autonomous agents in multiagent systems

    Get PDF
    The main goal of this doctoral thesis is to investigate a fundamental topic of research within the Multiagent Systems paradigm: the problem of defining open, heterogeneous, and dynamic interaction frameworks. That is to realize interaction systems where multiple agents can enter and leave dynamically and where no assumptions are made on the internal structure of the interacting agents. Such topic of research has received much attention in the past few years. In particular the need to realize applications where artificial agents can interact negotiate, exchange information, resources, and services has become more and more important thanks to the advent of Internet. I started my studies by developing a trading agent that took part to an international trading on-line game: the First Trading Agent Competition (TAC). During the design and development phase of the trading agent some crucial and critical troubles emerged: the problem of accurately understanding the rules that govern the different auctions; and the problem of understanding the meaning of the numerous messages. Another general problem is that the internal structure of the developed trading agent have been strongly determined by the peculiar interface of the interaction system, consequently without any changes in its code, it would not be able to take part to any other competition on the Web. Furthermore the trading agent would not have been able to exploit opportunities, to handle unexpected situations, or to reason about the rules of the various auctions, since it is not able to understand the meaning o the exchanged messages. The presence of all those problems bears out the need to find a standard common accepted way to define open interaction systems. The most important component of every interaction framework, as is remarked also by philosophical studies on human communication is the institution of language. Therefore I start to investigate the problem of defining a standard and common accepted semantics for Agent Communication Languages (ACL). The solutions proposed so far are at best partial, and are considered as unsatisfactory by a large number of specialists. In particular, they are unable to support verifiable compliance to standards and to make agents responsible for their communicative actions. Furthermore such proposals make the strong assumption that every interacting agent may be modeled as a Belief-Desire-Intention agent. What is required is an approach focused on externally observable events as opposed to the unobservable internal states of agents. Following Speech Act Theory that views language use as a form of action, I propose an operational specification for the definition of a standard ACL based on the notion of social commitment. In such a proposal the meaning of basic communicative acts is defined as the effect that it has on the social relationship between the sender and the receiver described through operation on an unambiguous, objective, and public "object": the commitment. The adoption of the notion of commitment is crucial to stabilize the interaction among agents, to create an expectation on other agents behavior, to enable agents to reason about their and other agents actions. The proposed ACL is verifiable, that is, it is possible to determine if an agent is behaving in accordance to its communicative actions; the semantics is objective, independent of the agent's internal structure, flexible and extensible, simple, yet enough expressive. A complete operational specification of an interaction framework using the proposed commitment-based ACL is presented. In particular some sample applications of how to use the proposed framework to formalize interaction protocols are reported. A list of soundness conditions to test if a protocol is sound is proposed