16 research outputs found

    Data-Driven Decisions and Actions in Today’s Software Development

    Full text link
    Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues

    Prototype of a tool for automatic generation of commit messages for Java applications

    Get PDF
    Although version control systems allow developers to describe and explain the rationale behind code changes in commit messages, the state of practice indicates that most of the time such commit messages are either very short or even empty. In fact, in a recent study of 23K+ Java projects it has been found that only 10% of the messages are descriptive and over 66% of those messages contained fewer words as compared to a typical English sentence. However, accurate and complete commit messages summarizing software changes are important to support a number of development and maintenance tasks. This thesis presents an approach, coined as ChangeScribe, which is designed to generate commit messages automatically from change sets. ChangeScribe generates natural language commit messages by taking into account commit stereotype, the type of changes (e.g., files rename, changes done only to property files), as well as the impact set of the underlying changes. This work presents the evaluation of ChangeScribe in an evaluative survey involving 23 developers in which the participants analyzed automatically generated commit messages from real changes and compared them with commit messages written by the original developers of six open source systems. The results demonstrate that automatically generated messages by ChangeScribe are preferred in about 62% of the cases for large commits, and about 54% for small commitsResumen. Aunque los sistemas de control de versiones le permiten a los desarrolladores de software describir y explicar las razones por la cuales modificaron el código fuente utilizando un mensaje en el commit, en la práctica estos mensajes son muy cortos o incluso vacíos. De hecho, en recientes estudios de 23K+ de proyectos Java se ha encontrado que el 10% de los mensajes son descriptivos y alrededor del 66% de estos contienen pocas palabras comparado con el tamaño promedio de una oración escrita en el idioma inglés. Sin embargo, resumir los cambios en el software de una manera precisa y completa es muy importante para apoyar las tareas que se realizan en el desarrollo y mantenimiento de un software. Este trabajo presenta ChangeScribe un prototipo para generar mensajes de commit usando lenguaje natural y teniendo en cuenta el estereotipo del commit, el tipo de cambio (rename de un archivo, cambios a archivos de propiedades, etc ), y también el conjunto de impacto de los cambios realizados. De otro lado, presenta la evaluación de ChangeScribe en un estudio de usuarios que involucró 23 desarrolladores de software que analizaron los mensajes de commit generados automáticamente por ChangeScribe y los mensajes de commit escritos por los desarrolladores originales de seis sistemas open source. Los resultados demuestran que los mensajes generados de forma automática por ChangeScribe son preferidos en cerca del 62% de los casos en commits largos, y en cerca de 54% de los casos en commits cortos (pocas modificaciones).Maestrí

    Challenges and Prospects

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 행정대학원 글로벌행정전공, 2023. 2. Taehyon Choi, Ph.D. in Policy, Planning, and Development (Public Management), University of Southern California.Artificial intelligence—loosely defined as the incorporation of human intellect into machines—is a branch of computer science that develops ways for machines to perform tasks that would normally be impossible for humans. In particular, we used the following definition in this research according to Haenlein & Kaplan, AI is a systems ability to process data correctly, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation. Applying artificial intelligence in Egypt is very important matter because of artificial intelligence, particularly that which is data-driven, such as machine learning, is expected to make a significant impact on economic and social systems. As stated in Egypt's national strategy for artificial intelligence, there is an expectation of increasing the global economy by 15 trillion dollars by 2030, as well as nearly 25% GDP growth for countries that fully integrate AI into their economies . The scope of this research is to investigate in depth the challenges that Egypt would face regarding the mechanism of applying AI in the government, and the convenient solutions to these challenges as well as the priority of these solutions based on Egypt's resources. This research is a qualitative for a descriptive purpose, using a Delphi method in data collections based on a questionnaire, the participants for this questionnaire are from public sector and private sector who working on and /or teaching artificial intelligence. After the literature reviews and data collections, the factors that could be challenges to the Egyptian government are data availability, technological infrastructure support for AI, awareness and knowledge of AI for people, the education system, and qualified human resources. Egypt, on the other hand, is working to overcome some of these challenges, but they remain challenges for which no solutions have been found. This study provides some recommendations that may assist in dealing with these challenges.인공지능(AI)은 일반적으로 인간이 할 수 없는 일을 기계가 수행하는 방법을 개발하는 컴퓨터 과학의 한 분야이다. 특히 Henlein & Kaplan에 따르면, 이 연구에서 다음과 같은 정의를 사용했는데, AI는 "데이터를 올바르게 처리하고, 그러한 데이터로부터 학습하며, 이러한 학습을 사용하여 유연한 적응을 통해 특정 목표와 작업을 달성하는 시스템의 능력"이다. 이집트에서 인공지능을 적용하는 것은 매우 중요한 문제인데, 특히 머신러닝과 같은 데이터 중심 인공지능이 경제 및 사회 시스템에 상당한 영향을 미칠 것으로 예상되기 때문이다. 이집트의 인공지능 국가전략에서 밝힌 대로 2030년까지 경제규모를 15조 달러 늘리는 것은 물론 AI를 자국 경제에 완전히 접목한 국가들의 GDP 성장률이 25%에 육박할 것이라는 전망이 나온다. 본 연구는 정부에 AI를 적용하는 메커니즘과 관련하여 이집트가 직면할 과제와 이집트의 자원을 기반으로 한 해결책의 우선 순위를 심층적으로 조사하였다. 또한 인공지능에 대해 연구하고/또는 가르치고 있는 공공 부문 및 민간 부문 인력을 대상의 설문지를 기반으로 데이터수집을 하였으며 델파이 방식을 사용하였다. 문헌 검토와 데이터 수집 이후 이집트 정부에 난제가 될 수 있는 요인은 데이터 가용성, AI에 대한 기술 인프라 지원, 사람에 대한 AI에 대한 인식과 지식, 교육 시스템, 자격 있는 인력 등이다. 이집트는 이러한 문제들 중 일부를 극복하기 위해 노력하고 있지만, 여전히 해결이 필요한 문제로 남아있다. 본 연구에서는 이러한 과제를 해결하는 데 도움이 될 수 있는 몇 가지 사항을 제언한다. 주요 키워드: 인공지능, 인공지능 도전, 정부, 공공 부문, 이집트Chapter (1): Introduction 5 1.1. Study background 5 1.2. Purpose of Study 10 1.3. Research Questions 11 Chapter (2): Status of AI in Egypt 12 2.1. Egypt in Numbers: 12 2.2. SWOT Analysis of AI in Egypt: 15 2.3. Pave the Path for the Applying of AI 18 Chapter (3): Literature Reviews 22 3.1. Challenges of Applying AI: 22 3.2. Opportunities of AI for the public sector 24 3.3. The importance of AI in the government 26 3.4. Factors affecting the implementation of AI 27 3.5. The Distinction Between the Public and Private Sectors in Terms of AI Application 33 3.6. AI Techniques and Government Functions 36 3.7. Theoretical Model for Challenges to AI in Public Sector 38 3.8. Institutional & Regulatory Theories for AI 39 Chapter (4): Research Methodology 43 4.1. Research Framework 43 4.2. Definitions of Variables: 43 4.3. Research Design 45 4.3.1. Research Method 45 4.3.2. Process of Research Method 45 4.3.2.1. Pilot Survey 45 4.3.2.1.1. Pilot Survey Results: 46 4.3.2.2. Questionnaire 47 4.4. Data Collection 47 Chapter (5): Data analysis 50 5.2. The analysis of participant's responses is as follows: 52 5.2.1. Challenges of applying AI 52 5.2.2. Benefits of AI 56 5.2.3. Drawbacks of AI 57 5.2.4. Suggested Solutions 59 5.2.5. Egypt's resources vis AI solutions 62 5.2.6. Strategy and Sectors 63 5.2.7. Differences Between the Private Sector and Public Sector in Applying AI 64 5.2.8. Proper Countries Experiments 65 5.3. Findings: 67 Chapter (6): Conclusion and Recommendation 70 6.1. Conclusion: 70 6.2. Recommendations: 73 References 76 Appendix (1) 80 List of Figures 80 Appendix (2) 80 List of Tables 80 Appendix (3) 81 List of questionnaire participants 81 국문초록 84석

    Towards automatic context-aware summarization of code entities

    Get PDF
    Software developers are working with different methods and classes and in order to understand those that perplex them and–or that are part of their tasks, they need to tackle with a huge amount of information. Therefore, providing developers with high-quality summaries of code entities can help them during their maintenance and evolution tasks. To provide useful information about the purpose of code entities, informal documentation (Stack Overflow) has been shown to be an important source of information that can be leveraged. In this study, we investigate bug reports as a type of informal documentation and we apply machine learning to produce summaries of code entities (methods and classes) in bug reports. In the proposed approach, code entities are extracted using a technique in a form of an island parser that we implemented to identify code in bug reports. Additionally, we applied machine learning to select a set of useful sentences that will be part of the code entities’ summaries. We have used logistic regression as our machine learning technique to rank sentences based on their importance. To this aim, a corpus of sentences is built based on the occurrence of code entities in the sentences belonging to bug reports containing the code entities in question. In the last step, summaries have been evaluated using surveys to estimate the quality of produced summaries. The results show that the automatically produced summaries can reduce time and effort to understand the usage of code entities. Specifically, the majority of participants found summaries extremely helpful to decrease the understanding time (43.5%) and the effort to understand the code entities (39.1%). In the future, summaries can be produced by using other informal documentation such as mailing lists or stack overflow, etc. Additionally, the approach can be applied in practical settings. Consequently, it can be used within an IDE such as Eclipse to assist developers during their software maintenance and evolution tasks

    An empirical investigation of relevant changes and automation needs in modern code review

    Get PDF
    SUMMARY of the PAPER: This paper investigates the approaches and tools that, from a "developer's point of view", are still needed to facilitate Modern Code Review (MCR) activities. To that end, we empirically elicited a taxonomy of recurrent review change types that characterize MCR. This by (i) qualitatively and quantitatively analyzing review changes/commits of ten open-source projects; (ii) integrating MCR change types from existing taxonomies available from the literature; and (iii) surveying 52 developers to integrate eventually missing change types in the taxonomy. The results of our study highlight that the availability of new emerging development technologies (e.g., cloud-based technologies) and practices (e.g., continuous delivery) has pushed developers to perform additional activities during MCR and that additional types of feedback are expected by reviewers. Our participants provided also recommendations, specified techniques to employ, and highlighted the data to analyze for building recommender systems able to automate the code review activities composing our taxonomy. In summary, this study sheds some more light on the approaches and tools that are still needed to facilitate MCR activities, confirming the feasibility and usefulness of using summarization techniques during MCR activities. We believe that the results of our work represent an essential step for meeting the expectations of developers and supporting the vision of full or partial automation in MCR. REPLICATION PACKAGE: https://zenodo.org/record/3679402#.XxgSgy17Hxg PREPRINT: https://spanichella.github.io/img/EMSE-MCR-2020.pdfRecent research has shown that available tools for Modern Code Review (MCR) are still far from meeting the current expectations of developers. The objective of this paper is to investigate the approaches and tools that, from a developer's point of view, are still needed to facilitate MCR activities. To that end, we first empirically elicited a taxonomy of recurrent review change types that characterize MCR. The taxonomy was designed by performing three steps: (i) we generated an initial version of the taxonomy by qualitatively and quantitatively analyzing 211 review changes/commits and 648 review comments of ten open-source projects; then (ii) we integrated into this initial taxonomy, topics, and MCR change types of an existing taxonomy available from the literature; finally, (iii) we surveyed 52 developers to integrate eventually missing change types in the taxonomy. Results of our study highlight that the availability of new emerging development technologies (e.g., cloud-based technologies) and practices (e.g., continuous delivery) has pushed developers to perform additional activities during MCR and that additional types of feedback are expected by reviewers. Our participants provided recommendations, specified techniques to employ, and highlighted the data to analyze for building recommender systems able to automate the code review activities composing our taxonomy. We surveyed 14 additional participants (12 developers and 2 researchers), not involved in the previous survey, to qualitatively assess the relevance and completeness of the identified MCR change types as well as assess how critical and feasible to implement are some of the identified techniques to support MCR activities. Thus, with a study involving 21 additional developers, we qualitatively assess the feasibility and usefulness of leveraging natural language feedback (automation considered critical/feasible to implement) in supporting developers during MCR activities. In summary, this study sheds some more light on the approaches and tools that are still needed to facilitate MCR activities, confirming the feasibility and usefulness of using summarization techniques during MCR activities. We believe that the results of our work represent an essential step for meeting the expectations of developers and supporting the vision of full or partial automation in MC

    Autofolding for Source Code Summarization

    Get PDF
    Developers spend much of their time reading and browsing source code, raising new opportunities for summarization methods. Indeed, modern code editors provide code folding, which allows one to selectively hide blocks of code. However this is impractical to use as folding decisions must be made manually or based on simple rules. We introduce the autofolding problem, which is to automatically create a code summary by folding less informative code regions. We present a novel solution by formulating the problem as a sequence of AST folding decisions, leveraging a scoped topic model for code tokens. On an annotated set of popular open source projects, we show that our summarizer outperforms simpler baselines, yielding a 28% error reduction. Furthermore, we find through a case study that our summarizer is strongly preferred by experienced developers. More broadly, we hope this work will aid program comprehension by turning code folding into a usable and valuable tool.Comment: IEEE Transactions on Software Engineering 201

    “Won’t we fix this issue?” : qualitative characterization and automated identification of wontfix issues on GitHub

    Get PDF
    Context: Addressing user requests in the form of bug reports and Github issues represents a crucial task of any successful software project. However, user-submitted issue reports tend to widely differ in their quality, and developers spend a considerable amount of time handling them. Objective: By collecting a dataset of around 6,000 issues of 279 GitHub projects, we observe that developers take significant time (i.e., about five months, on average) before labeling an issue as a wontfix. For this reason, in this paper, we empirically investigate the nature of wontfix issues and methods to facilitate issue management process. Method: We first manually analyze a sample of 667 wontfix issues, extracted from heterogeneous projects, investigating the common reasons behind a “wontfix decision”, the main characteristics of wontfix issues and the potential factors that could be connected with the time to close them. Furthermore, we experiment with approaches enabling the prediction of wontfix issues by analyzing the titles and descriptions of reported issues when submitted. Results and conclusion: Our investigation sheds some light on the wontfix issues’ characteristics, as well as the potential factors that may affect the time required to make a “wontfix decision”. Our results also demonstrate that it is possible to perform prediction of wontfix issues with high average values of precision, recall, and F-measure (90%-93%)
    corecore