770 research outputs found
Towards understanding the challenges faced by machine learning software developers and enabling automated solutions
Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To fill that gap this thesis reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikitlearn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. Our findings reveal the urgent need for software engineering (SE) research in this area. The second part of the thesis particularly focuses on understanding the Deep Neural Network (DNN) bug characteristics. We study 2,716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, their root causes and impacts, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. While exploring the bug characteristics, our findings imply that repairing software that uses DNNs is one such unmistakable SE need where automated tools could be beneficial; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing DNNs. So, the third part of this thesis presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from Github for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns and the most common bug fix patterns are fixing data dimension and neural network connectivity. Finally, we propose an automatic technique to detect ML Application Programming Interface (API) misuses. We started with an empirical study to understand ML API misuses. Our study shows that ML API misuse is prevalent and distinct compared to non-ML API misuses. Inspired by these findings, we contributed Amimla (Api Misuse In Machine Learning Apis) an approach and a tool for ML API misuse detection. Amimla relies on several technical innovations. First, we proposed an abstract representation of ML pipelines to use in misuse detection. Second, we proposed an abstract representation of neural networks for deep learning related APIs. Third, we have developed a representation strategy for constraints on ML APIs. Finally, we have developed a misuse detection strategy for both single and multi-APIs. Our experimental evaluation shows that Amimla achieves a high average accuracy of ∼80% on two benchmarks of misuses from Stack Overflow and Github
A Comprehensive Empirical Study of Bugs in Open-Source Federated Learning Frameworks
Federated learning (FL) is a distributed machine learning (ML) paradigm,
allowing multiple clients to collaboratively train shared machine learning (ML)
models without exposing clients' data privacy. It has gained substantial
popularity in recent years, especially since the enforcement of data protection
laws and regulations in many countries. To foster the application of FL, a
variety of FL frameworks have been proposed, allowing non-experts to easily
train ML models. As a result, understanding bugs in FL frameworks is critical
for facilitating the development of better FL frameworks and potentially
encouraging the development of bug detection, localization and repair tools.
Thus, we conduct the first empirical study to comprehensively collect,
taxonomize, and characterize bugs in FL frameworks. Specifically, we manually
collect and classify 1,119 bugs from all the 676 closed issues and 514 merged
pull requests in 17 popular and representative open-source FL frameworks on
GitHub. We propose a classification of those bugs into 12 bug symptoms, 12 root
causes, and 18 fix patterns. We also study their correlations and distributions
on 23 functionalities. We identify nine major findings from our study, discuss
their implications and future research directions based on our findings
An experimental study on network intrusion detection systems
A signature database is the key component of an elaborate intrusion detection system. The efficiency of signature generation for an intrusion detection system is a crucial requirement because of the rapid appearance of new attacks on the World Wide Web. However, in the commercial applications, signature generation is still a manual process, which requires professional skills and heavy human effort. Knowledge Discovery and Data Mining methods may be a solution to this problem. Data Mining and Machine Learning algorithms can be applied to the network traffic databases, in order to automatically generate signatures.
The purpose of this thesis and the work related to it is to construct a feasible architecture for building a database of network traffic data. This database can then be used to generate signatures automatically. This goal is achieved using network traffic data captured on the data communication network at the New Jersey Institute of Technology (NJIT)
AI Providers as Criminal Essay Mills? Large Language Models meet Contract Cheating Law
Many jurisdictions have passed very broadly drafted laws to tackle academic integrity issues, criminalising the provision or advertising of contract cheating or essay mills, such as the Skills and Post-16 Education Act 2022 in England and Wales. Recently, AI models such as chatGPT have amplified academic concerns. Here, we look at the intersection between these phenomena. We review academic cheating laws, showing that several may apply even to general purpose AI services like chatGPT, without knowledge and intent. We identify a range of illegal adverts for AI-enhanced essay mills, and illustrate how difficult it is to draw the line between writing an essay and supporting it, such as by generating bone fide references. We also outline the consequences for intermediaries hosting these ads or providing these services, which may be significantly affected by these primarily symbolic laws. We conclude with a series of recommendations for policymakers, legislators, and education providers
Denial of Service in Voice Over IP Networks
In this paper we investigate denial of service (DoS) vulnerabilities in Voice over IP (VoIP) systems, focusing on the ITU-T H.323 family of protocols. We provide a simple characterisation of DoS attacks that allows us to readily identify DoS issues in H.323 protocols. We also discuss network layer DoS vulnerabilities that affect VoIP systems. A number of improvements and further research directions are proposed
- …