515 research outputs found
Improving Software Project Health Using Machine Learning
In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools
Large Language Models for Software Engineering: A Systematic Literature Review
Large Language Models (LLMs) have significantly impacted numerous domains,
notably including Software Engineering (SE). Nevertheless, a well-rounded
understanding of the application, effects, and possible limitations of LLMs
within SE is still in its early stages. To bridge this gap, our systematic
literature review takes a deep dive into the intersection of LLMs and SE, with
a particular focus on understanding how LLMs can be exploited in SE to optimize
processes and outcomes. Through a comprehensive review approach, we collect and
analyze a total of 229 research papers from 2017 to 2023 to answer four key
research questions (RQs). In RQ1, we categorize and provide a comparative
analysis of different LLMs that have been employed in SE tasks, laying out
their distinctive features and uses. For RQ2, we detail the methods involved in
data collection, preprocessing, and application in this realm, shedding light
on the critical role of robust, well-curated datasets for successful LLM
implementation. RQ3 allows us to examine the specific SE tasks where LLMs have
shown remarkable success, illuminating their practical contributions to the
field. Finally, RQ4 investigates the strategies employed to optimize and
evaluate the performance of LLMs in SE, as well as the common techniques
related to prompt optimization. Armed with insights drawn from addressing the
aforementioned RQs, we sketch a picture of the current state-of-the-art,
pinpointing trends, identifying gaps in existing research, and flagging
promising areas for future study
Spatial networks with wireless applications
Many networks have nodes located in physical space, with links more common
between closely spaced pairs of nodes. For example, the nodes could be wireless
devices and links communication channels in a wireless mesh network. We
describe recent work involving such networks, considering effects due to the
geometry (convex,non-convex, and fractal), node distribution,
distance-dependent link probability, mobility, directivity and interference.Comment: Review article- an amended version with a new title from the origina
AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges
Artificial Intelligence for IT operations (AIOps) aims to combine the power
of AI with the big data generated by IT Operations processes, particularly in
cloud infrastructures, to provide actionable insights with the primary goal of
maximizing availability. There are a wide variety of problems to address, and
multiple use-cases, where AI capabilities can be leveraged to enhance
operational efficiency. Here we provide a review of the AIOps vision, trends
challenges and opportunities, specifically focusing on the underlying AI
techniques. We discuss in depth the key types of data emitted by IT Operations
activities, the scale and challenges in analyzing them, and where they can be
helpful. We categorize the key AIOps tasks as - incident detection, failure
prediction, root cause analysis and automated actions. We discuss the problem
formulation for each task, and then present a taxonomy of techniques to solve
these problems. We also identify relatively under explored topics, especially
those that could significantly benefit from advances in AI literature. We also
provide insights into the trends in this field, and what are the key investment
opportunities
Computer Aided Verification
The open access two-volume set LNCS 12224 and 12225 constitutes the refereed proceedings of the 32st International Conference on Computer Aided Verification, CAV 2020, held in Los Angeles, CA, USA, in July 2020.* The 43 full papers presented together with 18 tool papers and 4 case studies, were carefully reviewed and selected from 240 submissions. The papers were organized in the following topical sections: Part I: AI verification; blockchain and Security; Concurrency; hardware verification and decision procedures; and hybrid and dynamic systems. Part II: model checking; software verification; stochastic systems; and synthesis. *The conference was held virtually due to the COVID-19 pandemic
Multi-Granularity Detector for Vulnerability Fixes
With the increasing reliance on Open Source Software, users are exposed to
third-party library vulnerabilities. Software Composition Analysis (SCA) tools
have been created to alert users of such vulnerabilities. SCA requires the
identification of vulnerability-fixing commits. Prior works have proposed
methods that can automatically identify such vulnerability-fixing commits.
However, identifying such commits is highly challenging, as only a very small
minority of commits are vulnerability fixing. Moreover, code changes can be
noisy and difficult to analyze. We observe that noise can occur at different
levels of detail, making it challenging to detect vulnerability fixes
accurately.
To address these challenges and boost the effectiveness of prior works, we
propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from
prior works, Midas constructs different neural networks for each level of code
change granularity, corresponding to commit-level, file-level, hunk-level, and
line-level, following their natural organization. It then utilizes an ensemble
model that combines all base models to generate the final prediction. This
design allows MiDas to better handle the noisy and highly imbalanced nature of
vulnerability-fixing commit data. Additionally, to reduce the human effort
required to inspect code changes, we have designed an effort-aware adjustment
for Midas's outputs based on commit length. The evaluation results demonstrate
that MiDas outperforms the current state-of-the-art baseline in terms of AUC by
4.9% and 13.7% on Java and Python-based datasets, respectively. Furthermore, in
terms of two effort-aware metrics, EffortCost@L and Popt@L, MiDas also
outperforms the state-of-the-art baseline, achieving improvements of up to
28.2% and 15.9% on Java, and 60% and 51.4% on Python, respectively
Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021
The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
- …