11 research outputs found

    Deep Learning In Software Engineering

    Get PDF
    Software evolves and therefore requires an evolving field of Software Engineering. The evolution of software can be seen on an individual project level through the software life cycle, as well as on a collective level, as we study the trends and uses of software in the real world. As the needs and requirements of users change, so must software evolve to reflect those changes. This cycle is never ending and has led to continuous and rapid development of software projects. More importantly, it has put a great responsibility on software engineers, causing them to adopt practices and tools that allow them to increase their efficiency. However, these tools suffer the same fate as software designed for the general population; they need to change in order to reflect the user’s needs. Fortunately, the demand for this evolving software has given software engineers a plethora of data and artifacts to analyze. The challenge arises when attempting to identify and apply patterns learned from the vast amount of data. In this dissertation, we explore and develop techniques to take advantage of the vast amount of software data and to aid developers in software development tasks. Specifically, we exploit the tool of deep learning to automatically learn patterns discovered within previous software data and automatically apply those patterns to present day software development. We first set out to investigate the current impact of deep learning in software engineering by performing a systematic literature review of top tier conferences and journals. This review provides guidelines and common pitfalls for researchers to consider when implementing DL (Deep Learning) approaches in SE (Software Engineering). In addition, the review provides a research road map for areas within SE where DL could be applicable. Our next piece of work developed an approach that simultaneously learned different representations of source code for the task of clone detection. We found that the use of multiple representations, such as Identifiers, ASTs, CFGs and bytecode, can lead to the identification of similar code fragments. Through the use of deep learning strategies, we automatically learned these different representations without the requirement of hand-crafted features. Lastly, we designed a novel approach for automating the generation of assert statements through seq2seq learning, with the goal of increasing the efficiency of software testing. Given the test method and the context of the associated focal method, we automatically generated semantically and syntactically correct assert statements for a given, unseen test method. We exemplify that the techniques presented in this dissertation provide a meaningful advancement to the field of software engineering and the automation of software development tasks. We provide analytical evaluations and empirical evidence that substantiate the impact of our findings and usefulness of our approaches toward the software engineering community

    Diagnóstico de las condiciones de trabajo de los desarrolladores de software

    Get PDF
    El tema peopleware, aunque introducido en el ámbito de ciencias informáticas desde sus inicios, aún está en proceso de consolidación, y se evidencia que los estudios se han enfocado principalmente en el desarrollo tecnológico. Sin embargo, aunque no se han abandonado totalmente los estudios al respecto, se han desarrollado en su mayoría en el ámbito de países desarrollados, mientras que, en países emergentes como Colombia, tanto su estudio como aplicación han ocurrido en menor medida, dada la escasez de información relevante y la informalidad en los procesos de gestión de recursos humanos en el área de TI. Se puede identificar que al trabajador en el área de desarrollo de un proyecto de tecnología se le exige más en la productividad y calidad del producto, suponiendo erróneamente que es lo que más importa, sin tener en cuenta la conformidad y bienestar del trabajador, lo que ocasiona efectos negativos en el trabajador, como el denominado estrés laboral. Cuando una compañía busca incrementar su productividad, su punto de partida debería ser su activo humano, ya que identificar factores psicosociales en su personal, permitirá a los gerentes de proyecto mantener un grupo de trabajo a gusto y, por ende, productivo. Esta investigación se centra en diagnosticar las condiciones laborales de los trabajadores del área de desarrollo de software, a partir de los modelos existentes para medir la satisfacción de los trabajadores con sus cargos actuales. Al final del estudio, se pretende obtener una caracterización del área que sirva como punto de partida para proponer futuras investigaciones y mejorar los modelos y metodologías de desarrollo existentes en las que el elemento peopleware se considere tanto o más importante que el mismo software o hardware para alcanzar un proyecto de éxito.Abstract: Although the issue of peopleware was introduced in the field of computer science since its beginnings, it is still in the process of consolidation, and it is evident that studies have focused mainly on technological development. However, despite the studies have not stopped, those on this matter have been fostered mostly in developed countries, while in emerging countries, such as Colombia, both the information and the application of this concept have been scarce. This situation has been caused mainly by the lack of relevant information and the informality in human resources management processes. It can be identified that the worker in the development of a technology project has more demands in productivity and product quality, due to the assumption that it is what matters most. As a result, the comfortableness and well-being of the worker are not considered, which causes negative effects in the work. When a company seeks to improve its productivity, the starting point should be its human asset, because identifying the psychosocial factors in its personnel will allow the managers to maintain a satisfied working group and, therefore, a high productivity. The present investigation focused on the diagnosis of the working conditions of the staff of the software development area, taking advantage of the existing models to measure the satisfaction of the workers with their current positions. At the end of the study, the aim is to obtain a characterization of the area that serves as a starting point to propose future research and improve the existing models and development methodologies. Consequently, the peopleware element is considered as or more important than the software or hardware, to achieve a successful project.Maestrí

    Would wider adoption of reproducible research be beneficial for empirical software engineering research?

    Get PDF
    Researchers have identified problems with the validity of software engineering research findings. In particular, it is often impossible to reproduce data analyses, due to lack of raw data, or sufficient summary statistics, or undefined analysis procedures. The aim of this paper is to raise awareness of the problems caused by unreproducible research in software engineering and to discuss the concept of reproducible research (RR) as a mechanism to address these problems. RR is the idea that the outcome of research is both a paper and its computational environment. We report some recent studies that have cast doubts on the reliability of research outcomes in software engineering. Then we discuss the use of RR as a means of addressing these problems. We discuss the use of RR in software engineering research and present the methodology we have used to adopt RR principles. We report a small working example of how to create reproducible research. We summarise advantages of and problems with adopting RR methods. We conclude that RR supports good scientific practice and would help to address some of the problems found in empirical software engineering research

    Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks.

    Get PDF
    Information Retrieval (IR) approaches are used to leverage textual or unstructured data generated during the software development process to support various software engineering (SE) tasks (e.g., concept location, traceability link recovery, change impact analysis, etc.). Two of the most important steps for applying IR techniques to support SE tasks are preprocessing the corpus and configuring the IR technique, and these steps can significantly influence the outcome and the amount of effort developers have to spend for these maintenance tasks. We present the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process to support SE tasks. The approach named IR-GA determines the (near) optimal solution to be used for each step of the IR process without requiring any training. We applied IR-GA on three different SE tasks and the results of the study indicate that IR-GA outperforms approaches previously used in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised approach and a combinatorial approach

    A Fault-Based Model of Fault Localization Techniques

    Get PDF
    Every day, ordinary people depend on software working properly. We take it for granted; from banking software, to railroad switching software, to flight control software, to software that controls medical devices such as pacemakers or even gas pumps, our lives are touched by software that we expect to work. It is well known that the main technique/activity used to ensure the quality of software is testing. Often it is the only quality assurance activity undertaken, making it that much more important. In a typical experiment studying these techniques, a researcher will intentionally seed a fault (intentionally breaking the functionality of some source code) with the hopes that the automated techniques under study will be able to identify the fault\u27s location in the source code. These faults are picked arbitrarily; there is potential for bias in the selection of the faults. Previous researchers have established an ontology for understanding or expressing this bias called fault size. This research captures the fault size ontology in the form of a probabilistic model. The results of applying this model to measure fault size suggest that many faults generated through program mutation (the systematic replacement of source code operators to create faults) are very large and easily found. Secondary measures generated in the assessment of the model suggest a new static analysis method, called testability, for predicting the likelihood that code will contain a fault in the future. While software testing researchers are not statisticians, they nonetheless make extensive use of statistics in their experiments to assess fault localization techniques. Researchers often select their statistical techniques without justification. This is a very worrisome situation because it can lead to incorrect conclusions about the significance of research. This research introduces an algorithm, MeansTest, which helps automate some aspects of the selection of appropriate statistical techniques. The results of an evaluation of MeansTest suggest that MeansTest performs well relative to its peers. This research then surveys recent work in software testing using MeansTest to evaluate the significance of researchers\u27 work. The results of the survey indicate that software testing researchers are underreporting the significance of their work

    Supporting and Accelerating Reproducible Research in Software Maintenance using TraceLab Component Library

    No full text
    Abstract—Research studies in software maintenance are notoriously hard to reproduce due to lack of datasets, tools, implementation details (e.g., parameter values, environmental settings) and other factors. The progress in the field is hindered by the challenge of comparing new techniques against existing ones, as researchers have to devote a lot of their resources to the tedious and error-prone process of reproducing previously introduced approaches. In this paper, we address the problem of experiment reproducibility in software maintenance and provide a long term solution towards ensuring that future experiments will be reproducible and extensible. We conducted a mapping study of a number of representative maintenance techniques and approaches and implemented them as a library of experiments and components that we make publicly available with TraceLab, called the Component Library. The goal of these experiments and components is to create a body of actionable knowledge that would (i) facilitate future research and would (ii) allow the research community to contribute to it as well. In addition, to illustrate the process of using and adapting these techniques, we present an example of creating new techniques based on existing ones, which produce improved results. Keywords—software maintenance, reproducible, experiments, case studies, TraceLab I
    corecore