1,118 research outputs found

    The 11th Conference of PhD Students in Computer Science

    Get PDF

    Feature Set Selection for Improved Classification of Static Analysis Alerts

    Get PDF
    With the extreme growth in third party cloud applications, increased exposure of applications to the internet, and the impact of successful breaches, improving the security of software being produced is imperative. Static analysis tools can alert to quality and security vulnerabilities of an application; however, they present developers and analysts with a high rate of false positives and unactionable alerts. This problem may lead to the loss of confidence in the scanning tools, possibly resulting in the tools not being used. The discontinued use of these tools may increase the likelihood of insecure software being released into production. Insecure software can be successfully attacked resulting in the compromise of one or several information security principles such as confidentiality, availability, and integrity. Feature selection methods have the potential to improve the classification of static analysis alerts and thereby reduce the false positive rates. Thus, the goal of this research effort was to improve the classification of static analysis alerts by proposing and testing a novel method leveraging feature selection. The proposed model was developed and subsequently tested on three open source PHP applications spanning several years. The results were compared to a classification model utilizing all features to gauge the classification improvement of the feature selection model. The model presented did result in the improved classification accuracy and reduction of the false positive rate on a reduced feature set. This work contributes a real-world static analysis dataset based upon three open source PHP applications. It also enhanced an existing data set generation framework to include additional predictive software features. However, the main contribution is a feature selection methodology that may be used to discover optimal feature sets that increase the classification accuracy of static analysis alerts

    Neural Semantic Parsing for Syntax-Aware Code Generation

    Get PDF
    The task of mapping natural language expressions to logical forms is referred to as semantic parsing. The syntax of logical forms that are based on programming or query languages, such as Python or SQL, is defined by a formal grammar. In this thesis, we present an efficient neural semantic parser that exploits the underlying grammar of logical forms to enforce well-formed expressions. We use an encoder-decoder model for sequence prediction. Syntactically valid programs are guaranteed by means of a bottom-up shift-reduce parser, that keeps track of the set of viable tokens at each decoding step. We show that the proposed model outperforms the standard encoder-decoder model across datasets and is competitive with comparable grammar-guided semantic parsing approaches

    Towards Fine-Grained Localization of Privacy Behaviors

    Full text link
    Mobile applications are required to give privacy notices to users when they collect or share personal information. Creating consistent and concise privacy notices can be a challenging task for developers. Previous work has attempted to help developers create privacy notices through a questionnaire or predefined templates. In this paper, we propose a novel approach and a framework, called PriGen, that extends these prior work. PriGen uses static analysis to identify Android applications' code segments that process sensitive information (i.e. permission-requiring code segments) and then leverages a Neural Machine Translation model to translate them into privacy captions. We present the initial evaluation of our translation task for ~300,000 code segments

    Deep learning applied to the assessment of online student programming exercises

    Get PDF
    Massive online open courses (MOOCs) teaching coding are increasing in number and popularity. They commonly include homework assignments in which the students must write code that is evaluated by functional tests. Functional testing may to some extent be automated however provision of more qualitative evaluation and feedback may be prohibitively labor-intensive. Provision of qualitative evaluation at scale, automatically, is the subject of much research effort. In this thesis, deep learning is applied to the task of performing automatic assessment of source code, with a focus on provision of qualitative feedback. Four tasks: language modeling, detecting idiomatic code, semantic code search, and predicting variable names are considered in detail. First, deep learning models are applied to the task of language modeling source code. A comparison is made between the performance of different deep learning language models, and it is shown how language models can be used for source code auto-completion. It is also demonstrated how language models trained on source code can be used for transfer learning, providing improved performance on other tasks. Next, an analysis is made on how the language models from the previous task can be used to detect idiomatic code. It is shown that these language models are able to locate where a student has deviated from correct code idioms. These locations can be highlighted to the student in order to provide qualitative feedback. Then, results are shown on semantic code search, again comparing the performance across a variety of deep learning models. It is demonstrated how semantic code search can be used to reduce the time taken for qualitative evaluation, by automatically pairing a student submission with an instructor’s hand-written feedback. Finally, it is examined how deep learning can be used to predict variable names within source code. These models can be used in a qualitative evaluation setting where the deep learning models can be used to suggest more appropriate variable names. It is also shown that these models can even be used to predict the presence of functional errors. Novel experimental results show that: fine-tuning a pre-trained language model is an effective way to improve performance across a variety of tasks on source code, improving performance by 5% on average; pre-trained language models can be used as zero-shot learners across a variety of tasks, with the zero-shot performance of some architectures outperforming the fine-tuned performance of others; and that language models can be used to detect both semantic and syntactic errors. Other novel findings include: removing the non-variable tokens within source code has negligible impact on the performance of models, and that these remaining tokens can be shuffled with only a minimal decrease in performance.Engineering and Physical Sciences Research Council (EPSRC) fundin

    Use of Graph Neural Networks in Aiding Defensive Cyber Operations

    Full text link
    In an increasingly interconnected world, where information is the lifeblood of modern society, regular cyber-attacks sabotage the confidentiality, integrity, and availability of digital systems and information. Additionally, cyber-attacks differ depending on the objective and evolve rapidly to disguise defensive systems. However, a typical cyber-attack demonstrates a series of stages from attack initiation to final resolution, called an attack life cycle. These diverse characteristics and the relentless evolution of cyber attacks have led cyber defense to adopt modern approaches like Machine Learning to bolster defensive measures and break the attack life cycle. Among the adopted ML approaches, Graph Neural Networks have emerged as a promising approach for enhancing the effectiveness of defensive measures due to their ability to process and learn from heterogeneous cyber threat data. In this paper, we look into the application of GNNs in aiding to break each stage of one of the most renowned attack life cycles, the Lockheed Martin Cyber Kill Chain. We address each phase of CKC and discuss how GNNs contribute to preparing and preventing an attack from a defensive standpoint. Furthermore, We also discuss open research areas and further improvement scopes.Comment: 35 pages, 9 figures, 8 table
    • …
    corecore