Search CORE

42,368 research outputs found

Are They All Good? Studying Practitioners' Expectations on the Readability of Log Messages

Author: Chen An Ran
Chen Tse-Hsun
Hu Xing
Li Zhenhao
Shang Weiyi
Xia Xin
Publication venue
Publication date: 17/08/2023
Field of study

Developers write logging statements to generate logs that provide run-time information for various tasks. The readability of log messages in the logging statements (i.e., the descriptive text) is rather crucial to the value of the generated logs. Immature log messages may slow down or even obstruct the process of log analysis. Despite the importance of log messages, there is still a lack of standards on what constitutes good readability in log messages and how to write them. In this paper, we conduct a series of interviews with 17 industrial practitioners to investigate their expectations on the readability of log messages. Through the interviews, we derive three aspects related to the readability of log messages, including Structure, Information, and Wording, along with several specific practices to improve each aspect. We validate our findings through a series of online questionnaire surveys and receive positive feedback from the participants. We then manually investigate the readability of log messages in large-scale open source systems and find that a large portion (38.1%) of the log messages have inadequate readability. Motivated by such observation, we further explore the potential of automatically classifying the readability of log messages using deep learning and machine learning models. We find that both deep learning and machine learning models can effectively classify the readability of log messages with a balanced accuracy above 80.0% on average. Our study provides comprehensive guidelines for composing log messages to further improve practitioners' logging practices.Comment: Accepted as a research paper at the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

arXiv.org e-Print Archive

AI-assisted Software Development Effort Estimation

Author: Koskinen Karri
Publication venue
Publication date: 07/06/2021
Field of study

Effort estimation is a critical aspect of software project management. Without accurate estimates of the developer effort a particular project will require, the project's timeline and resourcing cannot be efficiently planned, which greatly increases the likelihood of the project failing to meet at least some of its goals. The goal of this thesis is to apply machine learning methods to analyze the work hour data logged by individual employees in order to provide project management with useful estimations of how much more effort it will take to finish a given project, and how long that will take. The work is conducted for ATR Soft Oy, using the data from their internal work hour logging tool. At first a literature review is conducted to determine what kind of estimation methods and tools are currently used in the software industry, and what kind of objectives and requirements organizations commonly set for their estimation processes. The basics of machine learning are explained, and a brief look is taken at how machine learning is currently used to support software engineering and project management. The literature review revealed that while machine learning methods have been applied to software project estimation for decades at this point, such data-driven methods generally suffer from a lack of relevant historical project data, and thus aren't commonly used in the industry. Initial insights were gathered from the work hour data and analysis goals were refined accordingly. The data was pre-processed to a form suitable for training machine learning models. Two different modeling scenarios were tested: Creating a single general model from all available data, and creating multiple project-specific models of a more limited scope. The modeling performance data indicates that machine learning models based on work hour data are capable of achieving better results in some situations than traditional expert estimation. The models developed here are not reliable enough to be used as the sole estimation method, but can provide useful additional information to support decision making

UTUPub

Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection

Author: Chen Aokun
Gregio Andre
He Pan
Li Xiaolin
Oliveira Daniela
Sun Ruimin
Yuan Xiaoyong
Zhu Qile
Publication venue
Publication date: 04/12/2017
Field of study

In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology and framework for efficient and effective real-time malware detection, leveraging the best of conventional machine learning (ML) and deep learning (DL) algorithms. In PROPEDEUTICA, all software processes in the system start execution subjected to a conventional ML detector for fast classification. If a piece of software receives a borderline classification, it is subjected to further analysis via more performance expensive and more accurate DL methods, via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays to the execution of software subjected to deep learning analysis as a way to "buy time" for DL analysis and to rate-limit the impact of possible malware in the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and 877 commonly used benign software samples from various categories for the Windows OS. Our results show that the false positive rate for conventional ML methods can reach 20%, and for modern DL methods it is usually below 6%. However, the classification time for DL can be 100X longer than conventional ML methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the percentage of software subjected to DL analysis was approximately 40% on average. Further, the application of delays in software subjected to ML reduced the detection time by approximately 10%. Finally, we found and discussed a discrepancy between the detection accuracy offline (analysis after all traces are collected) and on-the-fly (analysis in tandem with trace collection). Our insights show that conventional ML and modern DL-based malware detectors in isolation cannot meet the needs of efficient and effective malware detection: high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure

arXiv.org e-Print Archive

Michigan Technological University