Search CORE

5 research outputs found

Predicting Good Configurations for GitHub and Stack Overflow Topic Models

Author: Treude Christoph
Wagner Markus
Publication venue
Publication date: 10/03/2019
Field of study

Software repositories contain large amounts of textual data, ranging from source code comments and issue descriptions to questions, answers, and comments on Stack Overflow. To make sense of this textual data, topic modelling is frequently used as a text-mining tool for the discovery of hidden semantic structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model that aims to explain the structure of a corpus by grouping texts. LDA requires multiple parameters to work well, and there are only rough and sometimes conflicting guidelines available on how these parameters should be set. In this paper, we contribute (i) a broad study of parameters to arrive at good local optima for GitHub and Stack Overflow text corpora, (ii) an a-posteriori characterisation of text corpora related to eight programming languages, and (iii) an analysis of corpus feature importance via per-corpus LDA configuration. We find that (1) popular rules of thumb for topic modelling parameter configuration are not applicable to the corpora used in our experiments, (2) corpora sampled from GitHub and Stack Overflow have different characteristics and require different configurations to achieve good model fit, and (3) we can predict good configurations for unseen corpora reliably. These findings support researchers and practitioners in efficiently determining suitable configurations for topic modelling when analysing textual data contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International Conference on Mining Software Repositorie

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

A Q-learning-based memetic algorithm for multi-objective dynamic software project scheduling

Author: Leandro L. Minku (7626029)
Naresh Marturi (7750595)
Xiao-Ning Shen (3491297)
Yi-Nan Guo (7750598)
Ying Han (137520)
Publication venue
Publication date: 24/10/2017
Field of study

Software project scheduling is the problem of allocating employees to tasks in a software project. Due to the large scale of current software projects, many studies have investigated the use of optimization algorithms to find good software project schedules. However, despite the importance of human factors to the success of software projects, existing work has considered only a limited number of human properties when formulating software project scheduling as an optimization problem. Moreover, the changing environments of software companies mean that software project scheduling is a dynamic optimization problem. However, there is a lack of effective dynamic scheduling approaches to solve this problem. This work proposes a more realistic mathematical model for the dynamic software project scheduling problem. This model considers that skill proficiency can improve over time and, different from previous work, it considers that such improvement is affected by the employees’ properties of motivation and learning ability, and by the skill difficulty. It also defines the objective of employees’ satisfaction with the allocation. It is considered together with the objectives of project duration, cost, robustness and stability under a variety of practical constraints. To adapt schedules to the dynamically changing software project environments, a multi-objective two-archive memetic algorithm based on Q-learning (MOTAMAQ) is proposed to solve the problem in a proactive-rescheduling way. Different from previous work, MOTAMAQ learns the most appropriate global and local search methods to be used for different software project environment states by using Q-learning. Extensive experiments on 18 dynamic benchmark instances and 3 instances derived from real-world software projects were performed. A comparison with seven other meta-heuristic algorithms shows that the strategies used by our novel approach are very effective in improving its convergence performance in dynamic environments, while maintaining a good distribution and spread of solutions. The Q-learning-based learning mechanism can choose appropriate search operators for the different scheduling environments. We also show how different trade-offs among the five objectives can provide software managers with a deeper insight into various compromises among many objectives, and enabling them to make informed decisions

Crossref

University of Birmingham Research Portal

Leicester Research Archive

How to Evaluate Solutions in Pareto-based Search-Based Software Engineering? A Critical Review and Methodological Guidance

Author: Chen Tao
Li Miqing
Yao Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2020
Field of study

With modern requirements, there is an increasing tendency of considering multiple objectives/criteria simultaneously in many Software Engineering (SE) scenarios. Such a multi-objective optimization scenario comes with an important issue -- how to evaluate the outcome of optimization algorithms, which typically is a set of incomparable solutions (i.e., being Pareto non-dominated to each other). This issue can be challenging for the SE community, particularly for practitioners of Search-Based SE (SBSE). On one hand, multi-objective optimization could still be relatively new to SE/SBSE researchers, who may not be able to identify the right evaluation methods for their problems. On the other hand, simply following the evaluation methods for general multi-objective optimization problems may not be appropriate for specific SE problems, especially when the problem nature or decision maker's preferences are explicitly/implicitly available. This has been well echoed in the literature by various inappropriate/inadequate selection and inaccurate/misleading use of evaluation methods. In this paper, we first carry out a systematic and critical review of quality evaluation for multi-objective optimization in SBSE. We survey 717 papers published between 2009 and 2019 from 36 venues in seven repositories, and select 95 prominent studies, through which we identify five important but overlooked issues in the area. We then conduct an in-depth analysis of quality evaluation indicators/methods and general situations in SBSE, which, together with the identified issues, enables us to codify a methodological guidance for selecting and using evaluation methods in different SBSE scenarios.Comment: This paper has been accepted by IEEE Transactions on Software Engineering, available as full OA: https://ieeexplore.ieee.org/document/925218

arXiv.org e-Print Archive

Loughborough University Institutional Repository

University of Birmingham Research Portal

The determinants of value addition: a crtitical analysis of global software engineering industry in Sri Lanka

Author: Manamendra M A Saman Chathuranga
Publication venue
Publication date: 16/03/2022
Field of study

It was evident through the literature that the perceived value delivery of the global software engineering industry is low due to various facts. Therefore, this research concerns global software product companies in Sri Lanka to explore the software engineering methods and practices in increasing the value addition. The overall aim of the study is to identify the key determinants for value addition in the global software engineering industry and critically evaluate the impact of them for the software product companies to help maximise the value addition to ultimately assure the sustainability of the industry. An exploratory research approach was used initially since findings would emerge while the study unfolds. Mixed method was employed as the literature itself was inadequate to investigate the problem effectively to formulate the research framework. Twenty-three face-to-face online interviews were conducted with the subject matter experts covering all the disciplines from the targeted organisations which was combined with the literature findings as well as the outcomes of the market research outcomes conducted by both government and nongovernment institutes. Data from the interviews were analysed using NVivo 12. The findings of the existing literature were verified through the exploratory study and the outcomes were used to formulate the questionnaire for the public survey. 371 responses were considered after cleansing the total responses received for the data analysis through SPSS 21 with alpha level 0.05. Internal consistency test was done before the descriptive analysis. After assuring the reliability of the dataset, the correlation test, multiple regression test and analysis of variance (ANOVA) test were carried out to fulfil the requirements of meeting the research objectives. Five determinants for value addition were identified along with the key themes for each area. They are staffing, delivery process, use of tools, governance, and technology infrastructure. The cross-functional and self-organised teams built around the value streams, employing a properly interconnected software delivery process with the right governance in the delivery pipelines, selection of tools and providing the right infrastructure increases the value delivery. Moreover, the constraints for value addition are poor interconnection in the internal processes, rigid functional hierarchies, inaccurate selections and uses of tools, inflexible team arrangements and inadequate focus for the technology infrastructure. The findings add to the existing body of knowledge on increasing the value addition by employing effective processes, practices and tools and the impacts of inaccurate applications the same in the global software engineering industry

University of Wales Trinity Saint David