    Software development process mining: discovery, conformance checking and enhancement

    Context. Modern software projects require the proper allocation of human, technical and financial resources. Very often, project managers make decisions supported only by their personal experience, intuition or simply by mirroring activities performed by others in similar contexts. Most attempts to avoid such practices use models based on lines of code, cyclomatic complexity or effort estimators, thus commonly supported by software repositories which are known to contain several flaws. Objective. Demonstrate the usefulness of process data and mining methods to enhance the software development practices, by assessing efficiency and unveil unknown process insights, thus contributing to the creation of novel models within the software development analytics realm. Method. We mined the development process fragments of multiple developers in three different scenarios by collecting Integrated Development Environment (IDE) events during their development sessions. Furthermore, we used process and text mining to discovery developers’ workflows and their fingerprints, respectively. Results. We discovered and modeled with good quality developers’ processes during programming sessions based on events extracted from their IDEs. We unveiled insights from coding practices in distinct refactoring tasks, built accurate software complexity forecast models based only on process metrics and setup a method for characterizing coherently developers’ behaviors. The latter may ultimately lead to the creation of a catalog of software development process smells. Conclusions. Our approach is agnostic to programming languages, geographic location or development practices, making it suitable for challenging contexts such as in modern global software development projects using either traditional IDEs or sophisticated low/no code platforms.Contexto. Projetos de software modernos requerem a correta alocação de recursos humanos, técnicos e financeiros. Frequentemente, os gestores de projeto tomam decisões suportadas apenas na sua própria experiência, intuição ou simplesmente espelhando atividades executadas por terceiros em contextos similares. As tentativas para evitar tais práticas baseiam-se em modelos que usam linhas de código, a complexidade ciclomática ou em estimativas de esforço, sendo estes tradicionalmente suportados por repositórios de software conhecidos por conterem várias limitações. Objetivo. Demonstrar a utilidade dos dados de processo e respetivos métodos de análise na melhoria das práticas de desenvolvimento de software, colocando o foco na análise da eficiência e revelando aspetos dos processos até então desconhecidos, contribuindo para a criação de novos modelos no contexto de análises avançadas para o desenvolvimento de software. Método. Explorámos os fragmentos de processo de vários programadores em três cenários diferentes, recolhendo eventos durante as suas sessões de desenvolvimento no IDE. Adicionalmente, usámos métodos de descoberta e análise de processos e texto no sentido de modelar o fluxo de trabalho dos programadores e as suas características individuais, respetivamente. Resultados. Descobrimos e modelámos com boa qualidade os processos dos programadores durante as suas sessões de trabalho, usando eventos provenientes dos seus IDEs. Revelámos factos desconhecidos sobre práticas de refabricação, construímos modelos de previsão da complexidade ciclomática usando apenas métricas de processo e criámos um método para caracterizar coerentemente os comportamentos dos programadores. Este último, pode levar à criação de um catálogo de boas/más práticas no processo de desenvolvimento de software. Conclusões. A nossa abordagem é agnóstica em termos de linguagens de programação, localização geográfica ou prática de desenvolvimento, tornando-a aplicável em contextos complexos tal como em projetos modernos de desenvolvimento global que utilizam tanto os IDEs tradicionais como as atuais e sofisticadas plataformas "low/no code"

    Book of abstracts: ISTAR-IUL Winter School 2018 Applied Transdisciplinary Research

    A Framework for Personalized Content Recommendations to Support Informal Learning in Massively Diverse Information WIKIS

    Personalization has proved to achieve better learning outcomes by adapting to specific learners’ needs, interests, and/or preferences. Traditionally, most personalized learning software systems focused on formal learning. However, learning personalization is not only desirable for formal learning, it is also required for informal learning, which is self-directed, does not follow a specified curriculum, and does not lead to formal qualifications. Wikis among other informal learning platforms are found to attract an increasing attention for informal learning, especially Wikipedia. The nature of wikis enables learners to freely navigate the learning environment and independently construct knowledge without being forced to follow a predefined learning path in accordance with the constructivist learning theory. Nevertheless, navigation on information wikis suffer from several limitations. To support informal learning on Wikipedia and similar environments, it is important to provide easy and fast access to relevant content. Recommendation systems (RSs) have long been used to effectively provide useful recommendations in different technology enhanced learning (TEL) contexts. However, the massive diversity of unstructured content as well as user base on such information oriented websites poses major challenges when designing recommendation models for similar environments. In addition to these challenges, evaluation of TEL recommender systems for informal learning is rather a challenging activity due to the inherent difficulty in measuring the impact of recommendations on informal learning with the absence of formal assessment and commonly used learning analytics. In this research, a personalized content recommendation framework (PCRF) for information wikis as well as an evaluation framework that can be used to evaluate the impact of personalized content recommendations on informal learning from wikis are proposed. The presented recommendation framework models learners’ interests by continuously extrapolating topical navigation graphs from learners’ free navigation and applying graph structural analysis algorithms to extract interesting topics for individual users. Then, it integrates learners’ interest models with fuzzy thesauri for personalized content recommendations. Our evaluation approach encompasses two main activities. First, the impact of personalized recommendations on informal learning is evaluated by assessing conceptual knowledge in users’ feedback. Second, web analytics data is analyzed to get an insight into users’ progress and focus throughout the test session. Our evaluation revealed that PCRF generates highly relevant recommendations that are adaptive to changes in user’s interest using the HARD model with rank-based mean average precision (MAP@k) scores ranging between 100% and 86.4%. In addition, evaluation of informal learning revealed that users who used Wikipedia with personalized support could achieve higher scores on conceptual knowledge assessment with average score of 14.9 compared to 10.0 for the students who used the encyclopedia without any recommendations. The analysis of web analytics data show that users who used Wikipedia with personalized recommendations visited larger number of relevant pages compared to the control group, 644 vs 226 respectively. In addition, they were also able to make use of a larger number of concepts and were able to make comparisons and state relations between concepts

    Automatic Sensor-free Affect Detection: A Systematic Literature Review

    Emotions and other affective states play a pivotal role in cognition and, consequently, the learning process. It is well-established that computer-based learning environments (CBLEs) that can detect and adapt to students' affective states can enhance learning outcomes. However, practical constraints often pose challenges to the deployment of sensor-based affect detection in CBLEs, particularly for large-scale or long-term applications. As a result, sensor-free affect detection, which exclusively relies on logs of students' interactions with CBLEs, emerges as a compelling alternative. This paper provides a comprehensive literature review on sensor-free affect detection. It delves into the most frequently identified affective states, the methodologies and techniques employed for sensor development, the defining attributes of CBLEs and data samples, as well as key research trends. Despite the field's evident maturity, demonstrated by the consistent performance of the models and the application of advanced machine learning techniques, there is ample scope for future research. Potential areas for further exploration include enhancing the performance of sensor-free detection models, amassing more samples of underrepresented emotions, and identifying additional emotions. There is also a need to refine model development practices and methods. This could involve comparing the accuracy of various data collection techniques, determining the optimal granularity of duration, establishing a shared database of action logs and emotion labels, and making the source code of these models publicly accessible. Future research should also prioritize the integration of models into CBLEs for real-time detection, the provision of meaningful interventions based on detected emotions, and a deeper understanding of the impact of emotions on learning

    Process Mining Concepts for Discovering User Behavioral Patterns in Instrumented Software

    Process Mining is a technique for discovering “in-use” processes from traces emitted to event logs. Researchers have recently explored applying this technique to documenting processes discovered in software applications. However, the requirements for emitting events to support Process Mining against software applications have not been well documented. Furthermore, the linking of end-user intentional behavior to software quality as demonstrated in the discovered processes has not been well articulated. After evaluating the literature, this thesis suggested focusing on user goals and actual, in-use processes as an input to an Agile software development life cycle in order to improve software quality. It also provided suggestions for instrumenting software applications to support Process Mining techniques

    Holistic recommender systems for software engineering

    The knowledge possessed by developers is often not sufficient to overcome a programming problem. Short of talking to teammates, when available, developers often gather additional knowledge from development artifacts (e.g., project documentation), as well as online resources. The web has become an essential component in the modern developer’s daily life, providing a plethora of information from sources like forums, tutorials, Q&A websites, API documentation, and even video tutorials. Recommender Systems for Software Engineering (RSSE) provide developers with assistance to navigate the information space, automatically suggest useful items, and reduce the time required to locate the needed information. Current RSSEs consider development artifacts as containers of homogeneous information in form of pure text. However, text is a means to represent heterogeneous information provided by, for example, natural language, source code, interchange formats (e.g., XML, JSON), and stack traces. Interpreting the information from a pure textual point of view misses the intrinsic heterogeneity of the artifacts, thus leading to a reductionist approach. We propose the concept of Holistic Recommender Systems for Software Engineering (H-RSSE), i.e., RSSEs that go beyond the textual interpretation of the information contained in development artifacts. Our thesis is that modeling and aggregating information in a holistic fashion enables novel and advanced analyses of development artifacts. To validate our thesis we developed a framework to extract, model and analyze information contained in development artifacts in a reusable meta- information model. We show how RSSEs benefit from a meta-information model, since it enables customized and novel analyses built on top of our framework. The information can be thus reinterpreted from an holistic point of view, preserving its multi-dimensionality, and opening the path towards the concept of holistic recommender systems for software engineering

    Exploring utilization of visualization for computer and network security

    The role of the network security administrator is continually morphing to keep pace with the ever-changing area of computer and network security. These changes are due in part to both the continual development of new security exploits by attackers as well as improvements in network security products available for use. One area which has garnered much research in the past decade is the use of visualization to ease the strain on network security administrators. Visualization mechanisms utilize the parallel processing power of the human visual system to allow for the identification of possible nefarious network activity. This research details the development and use of a visualization system for network security. The manuscript is composed of four papers which provide a progression of research pertaining to the system. The first paper utilizes research in the area of information visualization to develop a new framework for designing visualization systems for network security. Next, a visualization system is developed in the second paper which has been utilized during multiple cyber defense competitions to aid in competition performance. The last two papers deal with evaluating the developed system. First, an exploratory analysis provides an initial assessment using participant interviews during one cyber defense competition. Second, a quasi field experiment explores the intention of subjects to use the system based on the type of visualization being viewed

    Open Data

    Open data is freely usable, reusable, or redistributable by anybody, provided there are safeguards in place that protect the data’s integrity and transparency. This book describes how data retrieved from public open data repositories can improve the learning qualities of digital networking, particularly performance and reliability. Chapters address such topics as knowledge extraction, Open Government Data (OGD), public dashboards, intrusion detection, and artificial intelligence in healthcare
