5 research outputs found

    Leveraging Identifier Naming Structures in Source Code and Bug Reports to Localize Relevant Bugs

    Get PDF
    When bugs are found in source code, bug reports are created which contain relevant information for developers to locate and fix the bug. In large source code repositories, it can be difficult and time consuming for developers to manually analyze bug reports to locate a bug. The discovery of patterns between bug reports and source files has led to the creation of automated tools using various techniques. Automated bug localization techniques can reduce the amount of manual effort required by developers by ranking the most probable location of the bug using textual information from bug reports and source code. Although these approaches offer some assistance, the lexical mismatch between the bug reports and the source code makes it difficult to accurately locate the buggy source code file(s) using Information Retrieval (IR) techniques. Our research proposes a technique that takes advantage of the lexical and structural patterns observed in source code identifier names to help offset the mismatch between bug reports and their related source code files. Our observations reveal that there are lexical and structural identifier naming trends for different identifier types in the source code. Using two open-source projects, and collecting frequencies for observed identifier patterns across the project, we applied the observed frequencies to matched word occurrences in bug reports across our evaluation data set to modify the significance of that word. Based on observations discovered in our empirical analysis of open source repositories ElasticSearch and RxJava, we developed a method to modify the significance of a word by altering the weight of the matched word represented in the Term Frequency - Inverse Document Frequency (TF-IDF) vectorization of that particular bug report. The idea behind this approach is that if we come across a word perceived to be significant based on our observed identifier pattern frequency data, we can apply a weight to that word in the bug report vectorization to increase the cosine similarity score between the bug report and source file vectors. This work expands and improves upon previous work by Gharibi et al. [1], who propose a multicomponent approach that uses token matching, stack trace, semantic similarity, and a revised vector space model (rVSM). Specifically, our approach modifies the rVSM component, and our work is evaluated on the same three open-source software projects: AspectJ, SWT, and ZXing. The results of our approach are comparable to the results of Gharibi et al., and we achieve an improvement in some cases. It was observed that our work outperforms many existing bug localization approaches. Top@N, Mean Reciprocal Rank (MRR), and Mean Average Precision (MAP) are metrics used to evaluate and rank our work against other approaches, revealing some improvement in bug localization across three open-source projects

    Testing of Neural Networks

    Get PDF
    Research in Neural Networks is becoming more popular each year. Re- search has introduced different ways to utilize Neural Networks, but an important aspect is missing: Testing. There are only 16 papers that strictly address Testing Neural Networks with a majority of them focusing on Deep Neural Networks and a small part on Recurrent Neural Networks. Testing Re- current neural networks is just as important as testing Deep Neural Networks as they are used in products like Autonomous Vehicles. So there is a need to ensure that the recurrent neural networks are of high quality, reliable, and have the correct behavior. For the few existing research papers on the testing of recurrent neural networks, they only focused on LSTM or GRU recurrent neural network architectures, but more recurrent neural network architectures exist such as MGU, UGRNN, and Delta-RNN. This means we need to see if ex- isting test metrics works for these architectures or do we need to introduce new testing metrics. For this paper we have two objectives. First, we will do a comparative analysis of the 16 papers with research in Testing Neural Networks. We define the testing metrics and analyze the features such as code availability, programming languages, related testing software concepts, etc. We then perform a case study with the Neuron Coverage Test Metric. We will conduct an experiment using unoptimized RNN models trained by a tool within EXAMM, a RNN Framework and optimized RNN Models trained and optimized using ANTS. We compared the Neuron Coverage Outputs with the assumption that the Optimized Models will perform better

    Why did you clone these identifiers? Using Grounded Theory to understand Identifier Clones

    Get PDF
    Developers spend most of their time comprehending source code, with some studies estimating this activity takes between 58% to 70% of a developer鈥檚 time. To improve the readability of source code, and therefore the productivity of developers, it is important to understand what aspects of static code analysis and syntactic code structure hinder the understandability of code. Identifiers are a main source of code comprehension due to their large volume and their role as implicit documentation of a developer鈥檚 intent when writing code. Despite the critical role that identifiers play during program comprehension, there are no regulated naming standards for developers to follow when picking identifier names. Our research supports previous work aimed at understanding what makes a good identifier name, and practices to follow when picking names by exploring a phenomenon that occurs during identifier naming: identifier clones. Identifier clones are two or more identifiers that are declared using the same name. This is an important yet unexplored phenomenon in identifier naming where developers intentionally give the same name to two or more identifiers in separate parts of a system. We must study identifier clones to understand it鈥檚 impact on program comprehension and to better understand the nature of identifier naming. To accomplish this, we conducted an empirical study on identifier clones detected in open-source software engineered systems and propose a taxonomy of identifier clones containing categories that can explain why they are introduced into systems and whether they represent naming antipatterns

    Modelo inteligente de especificaci贸n de la granularidad de aplicaciones basadas en microservicios.

    Get PDF
    Los microservicios son un enfoque arquitect贸nico y organizativo del desarrollo de software en el que las aplicaciones est谩n compuestas por peque帽os servicios independientes que se comunican a trav茅s de una interfaz de programaci贸n de aplicaciones (API) bien definida, muchas empresas utilizan los microservicios para estructurar sus sistemas, tambi茅n la arquitectura de microservicios ha sido utilizada en otras 谩reas como la internet de las cosas (IoT), computaci贸n en el borde (edge computing), computaci贸n en la nube, desarrollo de veh铆culos aut贸nomos, telecomunicaciones, sistemas de E-Salud, E-Learning, entre otros. Un gran desaf铆o al dise帽ar este tipo de aplicaciones es encontrar una partici贸n o granularidad adecuada de los microservicios, proceso que a la fecha se realiza y dise帽a de forma intuitiva, seg煤n la experiencia del arquitecto o del equipo de desarrollo. La definici贸n del tama帽o o granularidad de los microservicios es un tema de investigaci贸n abierto y de inter茅s, no se han estandarizado patrones, m茅todos o modelos que permitan definir qu茅 tan peque帽o debe ser un microservicio. Las estrategias m谩s utilizadas para estimar la granularidad de los microservicios son: el aprendizaje autom谩tico, la similitud sem谩ntica, la programaci贸n gen茅tica y la ingenier铆a de dominio. En este trabajo de investigaci贸n doctoral se propone un modelo inteligente para especificar y evaluar la granularidad de los microservicios que hacen parte de una aplicaci贸n; teniendo en cuenta algunas caracter铆sticas como la complejidad cognitiva, el tiempo de desarrollo, el acoplamiento, la cohesi贸n y su comunicaci贸n. En el capitulo uno se presentan el marco te贸rico, se plantea el problema de investigaci贸n resuelto, junto con las preguntas de investigaci贸n que ayudan a resolverlo, tambi茅n se presentan los objetivos y la metodologia de investigaci贸n, por medio de la cual se propone una nueva pr谩ctica, un modelo inteligente de especificaci贸n de la granularidad de los microservicios llamada 驴Microsevices Backlog驴, tambi茅n se presentan las fases y m茅todos de investigaci贸n que permitieron resolver las preguntas de investigaci贸n planteadas. El captiulo dos presenta el esatado del arte y los trabajos relacionados con el presente trabajo de investigaci贸n doctoral; tambi茅n se identifican las m茅tricas que se han utilizado para definir y evaluar la granularidad de los microservicios. En el capitulo 3 se caracteriza el proceso de desarrollo de aplicaciones basadas en microservicios, explicando su uso en un caso de estudio llamado 驴Sinplafut驴. En el capitulo 4 se plantea la descripci贸n del 驴Microservice Backlog驴, se presenta la definici贸n de cada uno de sus componentes, entre los cuales se encuentran: el componente parametrizador, el componente agrupador (un algoritmo gen茅tico y un algoritmo de agrupamiento sem谩ntico basado en aprendizaje autom谩tico no supervisado), el componente evaluador de m茅tricas y el componente comparador de descomposiciones y de microservicios candidatos, tambi茅n se presenta la formulaci贸n matem谩tica de la granularidad de aplicaciones basadas en microservicios. El capitulo 5 presenta la evaluaci贸n de la pr谩ctica propuesta, se realiz贸 de forma iterativa usando cuatro casos de estudio, dos ejemplos planteados en el estado del arte (Cargo Tracking and JPet-Store) y dos proyectos reales (Foristom Conferences y Sinplafut), se utiliz贸 el Microservices Backlog para obtener y evaluar los microservicios candidatos de las cuatro aplicaciones. Se realiz贸 un analisis comparativo contra m茅todos propuestos en el estado del arte y con el dise帽o basado en el dominio (DDD), el cual es le m茅todo m谩s utilizado para definir los microservicios que van a ser parte de una aplicaci贸n. El Microservices Backlog obtuvo un bajo acoplamiento, alta cohesi贸n, baja complejidad y reduce la comunicaci贸n entre los microservicios, esto comparado con las propuestas del estado del arte y con DDD. Finalmente en el capitulo 6 se presentan las conclusiones, contribuciones, limitaciones y productos obtenidos como resultado de esta tesisDoctoradoDOCTOR(A) EN INGENIER脥

    Text Similarity Between Concepts Extracted from Source Code and Documentation

    Get PDF
    Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up聽to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p
    corecore