3,105 research outputs found

    Preserving the Quality of Architectural Tactics in Source Code

    Get PDF
    In any complex software system, strong interdependencies exist between requirements and software architecture. Requirements drive architectural choices while also being constrained by the existing architecture and by what is economically feasible. This makes it advisable to concurrently specify the requirements, to devise and compare alternative architectural design solutions, and ultimately to make a series of design decisions in order to satisfy each of the quality concerns. Unfortunately, anecdotal evidence has shown that architectural knowledge tends to be tacit in nature, stored in the heads of people, and lost over time. Therefore, developers often lack comprehensive knowledge of underlying architectural design decisions and inadvertently degrade the quality of the architecture while performing maintenance activities. In practice, this problem can be addressed through preserving the relationships between the requirements, architectural design decisions and their implementations in the source code, and then using this information to keep developers aware of critical architectural aspects of the code. This dissertation presents a novel approach that utilizes machine learning techniques to recover and preserve the relationships between architecturally significant requirements, architectural decisions and their realizations in the implemented code. Our approach for recovering architectural decisions includes the two primary stages of training and classification. In the first stage, the classifier is trained using code snippets of different architectural decisions collected from various software systems. During this phase, the classifier learns the terms that developers typically use to implement each architectural decision. These ``indicator terms\u27\u27 represent method names, variable names, comments, or the development APIs that developers inevitably use to implement various architectural decisions. A probabilistic weight is then computed for each potential indicator term with respect to each type of architectural decision. The weight estimates how strongly an indicator term represents a specific architectural tactics/decisions. For example, a term such as \emph{pulse} is highly representative of the heartbeat tactic but occurs infrequently in the authentication. After learning the indicator terms, the classifier can compute the likelihood that any given source file implements a specific architectural decision. The classifier was evaluated through several different experiments including classical cross-validation over code snippets of 50 open source projects and on the entire source code of a large scale software system. Results showed that classifier can reliably recognize a wide range of architectural decisions. The technique introduced in this dissertation is used to develop the Archie tool suite. Archie is a plug-in for Eclipse and is designed to detect wide range of architectural design decisions in the code and to protect them from potential degradation during maintenance activities. It has several features for performing change impact analysis of architectural concerns at both the code and design level and proactively keep developers informed of underlying architectural decisions during maintenance activities. Archie is at the stage of technology transfer at the US Department of Homeland Security where it is purely used to detect and monitor security choices. Furthermore, this outcome is integrated into the Department of Homeland Security\u27s Software Assurance Market Place (SWAMP) to advance research and development of secure software systems

    User Review-Based Change File Localization for Mobile Applications

    Get PDF
    In the current mobile app development, novel and emerging DevOps practices (e.g., Continuous Delivery, Integration, and user feedback analysis) and tools are becoming more widespread. For instance, the integration of user feedback (provided in the form of user reviews) in the software release cycle represents a valuable asset for the maintenance and evolution of mobile apps. To fully make use of these assets, it is highly desirable for developers to establish semantic links between the user reviews and the software artefacts to be changed (e.g., source code and documentation), and thus to localize the potential files to change for addressing the user feedback. In this paper, we propose RISING (Review Integration via claSsification, clusterIng, and linkiNG), an automated approach to support the continuous integration of user feedback via classification, clustering, and linking of user reviews. RISING leverages domain-specific constraint information and semi-supervised learning to group user reviews into multiple fine-grained clusters concerning similar users' requests. Then, by combining the textual information from both commit messages and source code, it automatically localizes potential change files to accommodate the users' requests. Our empirical studies demonstrate that the proposed approach outperforms the state-of-the-art baseline work in terms of clustering and localization accuracy, and thus produces more reliable results.Comment: 15 pages, 3 figures, 8 table

    Design of a Machine Learning-based Approach for Fragment Retrieval on Models

    Full text link
    [ES] El aprendizaje automático (ML por sus siglas en inglés) es conocido como la rama de la inteligencia artificial que reúne algoritmos estadísticos, probabilísticos y de optimización, que aprenden empíricamente. ML puede aprovechar el conocimiento y la experiencia que se han generado durante años en las empresas para realizar automáticamente diferentes procesos. Por lo tanto, ML se ha aplicado a diversas áreas de investigación, que estudian desde la medicina hasta la ingeniería del software. De hecho, en el campo de la ingeniería del software, el mantenimiento y la evolución de un sistema abarca hasta un 80% de la vida útil del sistema. Las empresas, que se han dedicado al desarrollo de sistemas software durante muchos años, han acumulado grandes cantidades de conocimiento y experiencia. Por lo tanto, ML resulta una solución atractiva para reducir sus costos de mantenimiento aprovechando los recursos acumulados. Específicamente, la Recuperación de Enlaces de Trazabilidad, la Localización de Errores y la Ubicación de Características se encuentran entre las tareas más comunes y relevantes para realizar el mantenimiento de productos software. Para abordar estas tareas, los investigadores han propuesto diferentes enfoques. Sin embargo, la mayoría de las investigaciones se centran en métodos tradicionales, como la indexación semántica latente, que no explota los recursos recopilados. Además, la mayoría de las investigaciones se enfocan en el código, descuidando otros artefactos de software como son los modelos. En esta tesis, presentamos un enfoque basado en ML para la recuperación de fragmentos en modelos (FRAME). El objetivo de este enfoque es recuperar el fragmento del modelo que realiza mejor una consulta específica. Esto permite a los ingenieros recuperar el fragmento que necesita ser trazado, reparado o ubicado para el mantenimiento del software. Específicamente, FRAME combina la computación evolutiva y las técnicas ML. En FRAME, un algoritmo evolutivo es guiado por ML para extraer de manera eficaz distintos fragmentos de un modelo. Estos fragmentos son posteriormente evaluados mediante técnicas ML. Para aprender a evaluarlos, las técnicas ML aprovechan el conocimiento (fragmentos recuperados de modelos) y la experiencia que las empresas han generado durante años. Basándose en lo aprendido, las técnicas ML determinan qué fragmento del modelo realiza mejor una consulta. Sin embargo, la mayoría de las técnicas ML no pueden entender los fragmentos de los modelos. Por lo tanto, antes de aplicar las técnicas ML, el enfoque propuesto codifica los fragmentos a través de una codificación ontológica y evolutiva. En resumen, FRAME está diseñado para extraer fragmentos de un modelo, codificarlos y evaluar cuál realiza mejor una consulta específica. El enfoque ha sido evaluado a partir de un caso real proporcionado por nuestro socio industrial (CAF, un proveedor internacional de soluciones ferroviarias). Además, sus resultados han sido comparados con los resultados de los enfoques más comunes y recientes. Los resultados muestran que FRAME obtuvo los mejores resultados para la mayoría de los indicadores de rendimiento, proporcionando un valor medio de precisión igual a 59.91%, un valor medio de exhaustividad igual a 78.95%, una valor-F medio igual a 62.50% y un MCC (Coeficiente de Correlación Matthews) medio igual a 0.64. Aprovechando los fragmentos recuperados de los modelos, FRAME es menos sensible al conocimiento tácito y al desajuste de vocabulario que los enfoques basados en información semántica. Sin embargo, FRAME está limitado por la disponibilidad de fragmentos recuperados para llevar a cabo el aprendizaje automático. Esta tesis presenta una discusión más amplia de estos aspectos así como el análisis estadístico de los resultados, que evalúa la magnitud de la mejora en comparación con los otros enfoques.[CAT] L'aprenentatge automàtic (ML per les seues sigles en anglés) és conegut com la branca de la intel·ligència artificial que reuneix algorismes estadístics, probabilístics i d'optimització, que aprenen empíricament. ML pot aprofitar el coneixement i l'experiència que s'han generat durant anys en les empreses per a realitzar automàticament diferents processos. Per tant, ML s'ha aplicat a diverses àrees d'investigació, que estudien des de la medicina fins a l'enginyeria del programari. De fet, en el camp de l'enginyeria del programari, el manteniment i l'evolució d'un sistema abasta fins a un 80% de la vida útil del sistema. Les empreses, que s'han dedicat al desenvolupament de sistemes programari durant molts anys, han acumulat grans quantitats de coneixement i experiència. Per tant, ML resulta una solució atractiva per a reduir els seus costos de manteniment aprofitant els recursos acumulats. Específicament, la Recuperació d'Enllaços de Traçabilitat, la Localització d'Errors i la Ubicació de Característiques es troben entre les tasques més comunes i rellevants per a realitzar el manteniment de productes programari. Per a abordar aquestes tasques, els investigadors han proposat diferents enfocaments. No obstant això, la majoria de les investigacions se centren en mètodes tradicionals, com la indexació semàntica latent, que no explota els recursos recopilats. A més, la majoria de les investigacions s'enfoquen en el codi, descurant altres artefactes de programari com són els models. En aquesta tesi, presentem un enfocament basat en ML per a la recuperació de fragments en models (FRAME). L'objectiu d'aquest enfocament és recuperar el fragment del model que realitza millor una consulta específica. Això permet als enginyers recuperar el fragment que necessita ser traçat, reparat o situat per al manteniment del programari. Específicament, FRAME combina la computació evolutiva i les tècniques ML. En FRAME, un algorisme evolutiu és guiat per ML per a extraure de manera eficaç diferents fragments d'un model. Aquests fragments són posteriorment avaluats mitjançant tècniques ML. Per a aprendre a avaluar-los, les tècniques ML aprofiten el coneixement (fragments recuperats de models) i l'experiència que les empreses han generat durant anys. Basant-se en l'aprés, les tècniques ML determinen quin fragment del model realitza millor una consulta. No obstant això, la majoria de les tècniques ML no poden entendre els fragments dels models. Per tant, abans d'aplicar les tècniques ML, l'enfocament proposat codifica els fragments a través d'una codificació ontològica i evolutiva. En resum, FRAME està dissenyat per a extraure fragments d'un model, codificar-los i avaluar quin realitza millor una consulta específica. L'enfocament ha sigut avaluat a partir d'un cas real proporcionat pel nostre soci industrial (CAF, un proveïdor internacional de solucions ferroviàries). A més, els seus resultats han sigut comparats amb els resultats dels enfocaments més comuns i recents. Els resultats mostren que FRAME va obtindre els millors resultats per a la majoria dels indicadors de rendiment, proporcionant un valor mitjà de precisió igual a 59.91%, un valor mitjà d'exhaustivitat igual a 78.95%, una valor-F mig igual a 62.50% i un MCC (Coeficient de Correlació Matthews) mig igual a 0.64. Aprofitant els fragments recuperats dels models, FRAME és menys sensible al coneixement tàcit i al desajustament de vocabulari que els enfocaments basats en informació semàntica. No obstant això, FRAME està limitat per la disponibilitat de fragments recuperats per a dur a terme l'aprenentatge automàtic. Aquesta tesi presenta una discussió més àmplia d'aquests aspectes així com l'anàlisi estadística dels resultats, que avalua la magnitud de la millora en comparació amb els altres enfocaments.[EN] Machine Learning (ML) is known as the branch of artificial intelligence that gathers statistical, probabilistic, and optimization algorithms, which learn empirically. ML can exploit the knowledge and the experience that have been generated for years to automatically perform different processes. Therefore, ML has been applied to a wide range of research areas, from medicine to software engineering. In fact, in software engineering field, up to an 80% of a system's lifetime is spent on the maintenance and evolution of the system. The companies, that have been developing these software systems for a long time, have gathered a huge amount of knowledge and experience. Therefore, ML is an attractive solution to reduce their maintenance costs exploiting the gathered resources. Specifically, Traceability Link Recovery, Bug Localization, and Feature Location are amongst the most common and relevant tasks when maintaining software products. To tackle these tasks, researchers have proposed a number of approaches. However, most research focus on traditional methods, such as Latent Semantic Indexing, which does not exploit the gathered resources. Moreover, most research targets code, neglecting other software artifacts such as models. In this dissertation, we present an ML-based approach for fragment retrieval on models (FRAME). The goal of this approach is to retrieve the model fragment which better realizes a specific query in a model. This allows engineers to retrieve the model fragment, which must be traced, fixed, or located for software maintenance. Specifically, the FRAME approach combines evolutionary computation and ML techniques. In the FRAME approach, an evolutionary algorithm is guided by ML to effectively extract model fragments from a model. These model fragments are then assessed through ML techniques. To learn how to assess them, ML techniques takes advantage of the companies' knowledge (retrieved model fragments) and experience. Then, based on what was learned, ML techniques determine which model fragment better realizes a query. However, model fragments are not understandable for most ML techniques. Therefore, the proposed approach encodes the model fragments through an ontological evolutionary encoding. In short, the FRAME approach is designed to extract model fragments, encode them, and assess which one better realizes a specific query. The approach has been evaluated in our industrial partner (CAF, an international provider of railway solutions) and compared to the most common and recent approaches. The results show that the FRAME approach achieved the best results for most performance indicators, providing a mean precision value of 59.91%, a recall value of 78.95%, a combined F-measure of 62.50%, and a MCC (Matthews correlation coefficient) value of 0.64. Leveraging retrieved model fragments, the FRAME approach is less sensitive to tacit knowledge and vocabulary mismatch than the approaches based on semantic information. However, the approach is limited by the availability of the retrieved model fragments to perform the learning. These aspects are further discussed, after the statistical analysis of the results, which assesses the magnitude of the improvement in comparison to the other approaches.Marcén Terraza, AC. (2020). Design of a Machine Learning-based Approach for Fragment Retrieval on Models [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/158617TESI

    Toward an Effective Automated Tracing Process

    Get PDF
    Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

    From Bugs to Decision Support – Leveraging Historical Issue Reports in Software Evolution

    Get PDF
    Software developers in large projects work in complex information landscapes and staying on top of all relevant software artifacts is an acknowledged challenge. As software systems often evolve over many years, a large number of issue reports is typically managed during the lifetime of a system, representing the units of work needed for its improvement, e.g., defects to fix, requested features, or missing documentation. Efficient management of incoming issue reports requires the successful navigation of the information landscape of a project. In this thesis, we address two tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team, and CIA is the subsequent activity of identifying how source code changes affect the existing software artifacts. While IA is fundamental in all large software projects, CIA is particularly important to safety-critical development. Our solution approach, grounded on surveys of industry practice as well as scientific literature, is to support navigation by combining information retrieval and machine learning into Recommendation Systems for Software Engineering (RSSE). While the sheer number of incoming issue reports might challenge the overview of a human developer, our techniques instead benefit from the availability of ever-growing training data. We leverage the volume of issue reports to develop accurate decision support for software evolution. We evaluate our proposals both by deploying an RSSE in two development teams, and by simulation scenarios, i.e., we assess the correctness of the RSSEs' output when replaying the historical inflow of issue reports. In total, more than 60,000 historical issue reports are involved in our studies, originating from the evolution of five proprietary systems for two companies. Our results show that RSSEs for both IA and CIA can help developers navigate large software projects, in terms of locating development teams and software artifacts. Finally, we discuss how to support the transfer of our results to industry, focusing on addressing the context dependency of our tool support by systematically tuning parameters to a specific operational setting

    Clustering and its Application in Requirements Engineering

    Get PDF
    Large scale software systems challenge almost every activity in the software development life-cycle, including tasks related to eliciting, analyzing, and specifying requirements. Fortunately many of these complexities can be addressed through clustering the requirements in order to create abstractions that are meaningful to human stakeholders. For example, the requirements elicitation process can be supported through dynamically clustering incoming stakeholders’ requests into themes. Cross-cutting concerns, which have a significant impact on the architectural design, can be identified through the use of fuzzy clustering techniques and metrics designed to detect when a theme cross-cuts the dominant decomposition of the system. Finally, traceability techniques, required in critical software projects by many regulatory bodies, can be automated and enhanced by the use of cluster-based information retrieval methods. Unfortunately, despite a significant body of work describing document clustering techniques, there is almost no prior work which directly addresses the challenges, constraints, and nuances of requirements clustering. As a result, the effectiveness of software engineering tools and processes that depend on requirements clustering is severely limited. This report directly addresses the problem of clustering requirements through surveying standard clustering techniques and discussing their application to the requirements clustering process

    Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks.

    Get PDF
    Information Retrieval (IR) approaches are used to leverage textual or unstructured data generated during the software development process to support various software engineering (SE) tasks (e.g., concept location, traceability link recovery, change impact analysis, etc.). Two of the most important steps for applying IR techniques to support SE tasks are preprocessing the corpus and configuring the IR technique, and these steps can significantly influence the outcome and the amount of effort developers have to spend for these maintenance tasks. We present the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process to support SE tasks. The approach named IR-GA determines the (near) optimal solution to be used for each step of the IR process without requiring any training. We applied IR-GA on three different SE tasks and the results of the study indicate that IR-GA outperforms approaches previously used in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised approach and a combinatorial approach
    corecore