42 research outputs found

    Industry-academia collaborations in software engineering: An empirical analysis of challenges, patterns and anti-patterns in research projects

    Get PDF
    Research collaboration between industry and academia supports improvement and innovation in industry and helps to ensure industrial relevance in academic research. However, many researchers and practitioners believe that the level of joint industry-academia collaboration (IAC) in software engineering (SE) research is still relatively low, compared to the amount of activity in each of the two communities. The goal of the empirical study reported in this paper is to exploratory characterize the state of IAC with respect to a set of challenges, patterns and anti-patterns identified by a recent Systematic Literature Review study. To address the above goal, we gathered the opinions of researchers and practitioners w.r.t. their experiences in IAC projects. Our dataset includes 47 opinion data points related to a large set of projects conducted in 10 different countries. We aim to contribute to the body of evidence in the area of IAC, for the benefit of researchers and practitioners in conducting future successful IAC projects in SE. As an output, the study presents a set of empirical findings and evidence-based recommendations to increase the success of IAC projects.Supported by the National Research Fund, Luxembourg FNR/P10/03. Supported by FCT (Fundação para a Ciˆencia e Tecnologia) within the Project Scope UID/CEC/00319/2013.info:eu-repo/semantics/publishedVersio

    Characterizing industry-academia collaborations in software engineering: evidence from 101 projects

    Get PDF
    Research collaboration between industry and academia supports improvement and innovation in industry and helps ensure the industrial relevance of academic research. However, many researchers and practitioners in the community believe that the level of joint industry-academia collaboration (IAC) projects in Software Engineering (SE) research is relatively low, creating a barrier between research and practice. The goal of the empirical study reported in this paper is to explore and characterize the state of IAC with respect to industrial needs, developed solutions, impacts of the projects and also a set of challenges, patterns and anti-patterns identified by a recent Systematic Literature Review (SLR) study. To address the above goal, we conducted an opinion survey among researchers and practitioners with respect to their experience in IAC. Our dataset includes 101 data points from IAC projects conducted in 21 different countries. Our findings include: (1) the most popular topics of the IAC projects, in the dataset, are: software testing, quality, process, and project managements; (2) over 90% of IAC projects result in at least one publication; (3) almost 50% of IACs are initiated by industry, busting the myth that industry tends to avoid IACs; and (4) 61% of the IAC projects report having a positive impact on their industrial context, while 31% report no noticeable impacts or were “not sure”. To improve this situation, we present evidence-based recommendations to increase the success of IAC projects, such as the importance of testing pilot solutions before using them in industry. This study aims to contribute to the body of evidence in the area of IAC, and benefit researchers and practitioners. Using the data and evidence presented in this paper, they can conduct more successful IAC projects in SE by being aware of the challenges and how to overcome them, by applying best practices (patterns), and by preventing anti-patterns.The authors would like to thank the researchers and practitioners who participated in this survey. JoĂŁo M. Fernandes was supported by FCT (Fundação para a CiĂȘncia e Tecnologia) within the Project Scope UID/CEC/00319/2013. Dietmar Pfahl was supported by the institutional research grant IUT20-55 of the Estonian Research Council. Andrea Arcuri was supported by the Research Council of Norway (grant agreement No 274385). Mika MĂ€ntylĂ€ was partially supported by Academy of Finland grant and ITEA3 / TEKES grant

    SiaLog:detecting anomalies in software execution logs using the siamese network

    No full text
    Abstract Detecting anomalies in software logs has become a notable concern for software engineers and maintainers as they represent anomalies in software execution paths and states. This paper propose a novel anomaly detection approach based on the Siamese network on top of Recurrent Neural Networks(RNN). Accordingly, we introduce a novel training pair generation algorithm to train the Siamese network which reduces generated training significantly while maintaining the F₁ score. Additionally, we propose a hybrid model by combining the Siamese network with a traditional feedforward neural network to make end-to-end training possible, reducing engineering effort in setting up a deep-learning-based log anomaly detector. Furthermore, we provides validations of the approach on the Hadoop Distributed File System (HDFS), Blue Gene/L (BGL), and Hadoop map-reduce task log datasets. To the best of our knowledge, the proposed approach outperforms other methods on the same dataset at the F₁ scores of respectively 0.99, 0.99, and 0.94 on HDFS, BGL, and Hadoop datasets, resulting in a new state-of-the-art performance. To further evaluate the proposed method, we examine our method’s robustness to log evolutions by evaluating the model on synthetically evolved log sequences; we got the F₁ score of 0.95 on the HDFS dataset at the noise ratio of 20%. Finally, we dive deep into some of the side benefits of the Siamese network. Accordingly, we introduce an unsupervised log evolution monitoring method alongside a visualization technique that facilitates model interpretability

    Predicting technical debt from commit contents:reproduction and extension with automated feature selection

    No full text
    Abstract Self-admitted technical debt refers to sub-optimal development solutions that are expressed in written code comments or commits. We reproduce and improve on a prior work by Yan et al. (2018) on detecting commits that introduce self-admitted technical debt. We use multiple natural language processing methods: Bag-of-Words, topic modeling, and word embedding vectors. We study 5 open-source projects. Our NLP approach uses logistic Lasso regression from Glmnet to automatically select best predictor words. A manually labeled dataset from prior work that identified self-admitted technical debt from code level commits serves as ground truth. Our approach achieves + 0.15 better area under the ROC curve performance than a prior work, when comparing only commit message features, and + 0.03 better result overall when replacing manually selected features with automatically selected words. In both cases, the improvement was statistically significant (p < 0.0001). Our work has four main contributions, which are comparing different NLP techniques for SATD detection, improved results over previous work, showing how to generate generalizable predictor words when using multiple repositories, and producing a list of words correlating with SATD. As a concrete result, we release a list of the predictor words that correlate positively with SATD, as well as our used datasets and scripts to enable replication studies and to aid in the creation of future classifiers

    20-MAD:20 years of issues and commits of Mozilla and Apache development

    No full text
    Abstract Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, enables deeper diving investigations. This paper presents 20-MAD, a dataset linking the commit and issue data of Mozilla and Apache projects. It includes over 20 years of information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue comments, and its compressed size is over 6 GB. The data contains all the typical information about source code commits (e.g., lines added and removed, message and commit time) and issues (status, severity, votes, and summary). The issue comments have been pre-processed for natural language processing and sentiment analysis. This includes emoticons and valence and arousal scores. Linking code repository and issue tracker information, allows studying individuals in two types of repositories and provide more accurate time zone information for issue trackers as well. To our knowledge, this the largest linked dataset in size and in project lifetime that is not based on GitHub

    PENTACET data:23 million contextual code comments and 250,000 SATD comments

    No full text
    Abstract Most Self-Admitted Technical Debt (SATD) research utilizes explicit SATD features such as ‘TODO’ and ‘FIXME’ for SATD detection. A closer look reveals several SATD research uses simple SATD (‘Easy to Find’) code comments without contextual data (preceding and succeeding source code context). This work addresses this gap through PENTACET (or 5C dataset) data. PENTACET is a large Curated Contextual Code Comments per Contributor and the most extensive SATD data. We mine 9,096 Open Source Software Java projects totaling over 400 million LOC. The outcome is a dataset with 23 million code comments, preceding and succeeding source code context for each comment, and more than 250,000 SATD comments, including both ‘Easy to Find’ and ‘Hard to Find’ SATD. We believe PENTACET data will further SATD research using Artificial Intelligence techniques

    Win GUI Crawler:a tool prototype for desktop GUI image and metadata collection

    No full text
    Abstract Despite the widespread of test automation, automatic testing of graphical user interfaces (GUI) remains a challenge. This is partly due to the difficulty of reliably identifying GUI elements over different versions of a given software system. Machine vision techniques could be a potential way of addressing this issue by automatically identifying GUI elements with the help of machine learning. However, developing a GUI testing tool relying on automatic identification of graphical elements first requires to acquire large amount of labeled data. In this paper, we present Win GUI Crawler, a tool for automatically gathering such data from Microsoft Windows GUI applications. The tool is based on Microsoft Windows Application Driver and performs actions on the GUI using a depth-first traversal of the GUI element tree. For each action performed by the crawler, screenshots are taken and metadata is extracted for each of the different screens. Bounding boxes of GUI elements are then filtered in order to identify what GUI elements are actually visible on the screen. Win GUI Crawler is then evaluated on several popular Windows applications and the current limitations are discussed

    How to configure masked event anomaly detection on software logs?

    No full text
    Abstract Software Log anomaly event detection with masked event prediction has various technical approaches with countless configurations and parameters. Our objective is to provide a baseline of settings for similar studies in the future. The models we use are the N-Gram model, which is a classic approach in the field of natural language processing (NLP), and two deep learning (DL) models long short-term memory (LSTM) and convolutional neural network (CNN). For datasets we used four datasets Profilence, BlueGene/L (BGL), Hadoop Distributed File System (HDFS) and Hadoop. Other settings are the size of the sliding window which determines how many surrounding events we are using to predict a given event, mask position (the position within the window we are predicting), the usage of only unique sequences, and the portion of data that is used for training. The results show clear indications of settings that can be generalized across datasets. The performance of the DL models does not deteriorate as the window size increases while the N-Gram model shows worse performance with large window sizes on the BGL and Profilence datasets. Despite the popularity of Next Event Prediction, the results show that in this context it is better not to predict events at the edges of the subsequence, i.e., first or last event, with the best result coming from predicting the fourth event when the window size is five. Regarding the amount of data used for training, the results show differences across datasets and models. For example, the N-Gram model appears to be more sensitive toward the lack of data than the DL models. Overall, for similar experimental setups we suggest the following general baseline: Window size 10, mask position second to last, do not filter out non-unique sequences, and use a half of the total data for training

    Pinpointing anomaly events in logs from stability testing:N-Grams vs. deep-learning

    No full text
    Abstract As stability testing execution logs can be very long, software engineers need help in locating anomalous events. We develop and evaluate two models for scoring individual log-events for anomalousness, namely an N-Gram model and a Deep Learning model with LSTM (Long short-term memory). Both are trained on normal log sequences only. We evaluate the models with long log sequences of Android stability testing in our company case and with short log sequences from HDFS (Hadoop Distributed File System) public dataset. We evaluate next event prediction accuracy and computational efficiency. The LSTM model is more accurate in stability testing logs (0.848 vs 0.865), whereas in HDFS logs the N-Gram is slightly more accurate (0.904 vs 0.900). The N-Gram model has far superior computational efficiency compared to the Deep model (4 to 13 seconds vs 16 minutes to nearly 4 hours), making it the preferred choice for our case company. Scoring individual log events for anomalousness seems like a good aid for root cause analysis of failing test cases, and our case company plans to add it to its online services. Despite the recent surge in using deep learning in software system anomaly detection, we found limited benefits in doing so. However, future work should consider whether our finding holds with different LSTM-model hyper-parameters, other datasets, and with other deep-learning approaches that promise better accuracy and computational efficiency than LSTM based models
    corecore