15 research outputs found
Learning based Methods for Code Runtime Complexity Prediction
Predicting the runtime complexity of a programming code is an arduous task.
In fact, even for humans, it requires a subtle analysis and comprehensive
knowledge of algorithms to predict time complexity with high fidelity, given
any code. As per Turing's Halting problem proof, estimating code complexity is
mathematically impossible. Nevertheless, an approximate solution to such a task
can help developers to get real-time feedback for the efficiency of their code.
In this work, we model this problem as a machine learning task and check its
feasibility with thorough analysis. Due to the lack of any open source dataset
for this task, we propose our own annotated dataset CoRCoD: Code Runtime
Complexity Dataset, extracted from online judges. We establish baselines using
two different approaches: feature engineering and code embeddings, to achieve
state of the art results and compare their performances. Such solutions can be
widely useful in potential applications like automatically grading coding
assignments, IDE-integrated tools for static code analysis, and others.Comment: 14 pages, 2 figures, 8 table
PERFOGRAPH: A Numerical Aware Program Graph Representation for Performance Optimization and Program Analysis
The remarkable growth and significant success of machine learning have
expanded its applications into programming languages and program analysis.
However, a key challenge in adopting the latest machine learning methods is the
representation of programming languages, which directly impacts the ability of
machine learning methods to reason about programs. The absence of numerical
awareness, composite data structure information, and improper way of presenting
variables in previous representation works have limited their performances. To
overcome the limitations and challenges of current program representations, we
propose a novel graph-based program representation called PERFOGRAPH.
PERFOGRAPH can capture numerical information and the composite data structure
by introducing new nodes and edges. Furthermore, we propose an adapted
embedding method to incorporate numerical awareness. These enhancements make
PERFOGRAPH a highly flexible and scalable representation that can effectively
capture program intricate dependencies and semantics. Consequently, it serves
as a powerful tool for various applications such as program analysis,
performance optimization, and parallelism discovery. Our experimental results
demonstrate that PERFOGRAPH outperforms existing representations and sets new
state-of-the-art results by reducing the error rate by 7.4% (AMD dataset) and
10% (NVIDIA dataset) in the well-known Device Mapping challenge. It also sets
new state-of-the-art results in various performance optimization tasks like
Parallelism Discovery and Numa and Prefetchers Configuration prediction
A survey on software defect prediction using deep learning
Defect prediction is one of the key challenges in software development and programming language research for improving software quality and reliability. The problem in this area is to properly identify the defective source code with high accuracy. Developing a fault prediction model is a challenging problem, and many approaches have been proposed throughout history. The recent breakthrough in machine learning technologies, especially the development of deep learning techniques, has led to many problems being solved by these methods. Our survey focuses on the deep learning techniques for defect prediction. We analyse the recent works on the topic, study the methods for automatic learning of the semantic and structural features from the code, discuss the open problems and present the recent trends in the field. © 2021 by the authors. Licensee MDPI, Basel, Switzerland