3 research outputs found

    Deep learning applied to the assessment of online student programming exercises

    Get PDF
    Massive online open courses (MOOCs) teaching coding are increasing in number and popularity. They commonly include homework assignments in which the students must write code that is evaluated by functional tests. Functional testing may to some extent be automated however provision of more qualitative evaluation and feedback may be prohibitively labor-intensive. Provision of qualitative evaluation at scale, automatically, is the subject of much research effort. In this thesis, deep learning is applied to the task of performing automatic assessment of source code, with a focus on provision of qualitative feedback. Four tasks: language modeling, detecting idiomatic code, semantic code search, and predicting variable names are considered in detail. First, deep learning models are applied to the task of language modeling source code. A comparison is made between the performance of different deep learning language models, and it is shown how language models can be used for source code auto-completion. It is also demonstrated how language models trained on source code can be used for transfer learning, providing improved performance on other tasks. Next, an analysis is made on how the language models from the previous task can be used to detect idiomatic code. It is shown that these language models are able to locate where a student has deviated from correct code idioms. These locations can be highlighted to the student in order to provide qualitative feedback. Then, results are shown on semantic code search, again comparing the performance across a variety of deep learning models. It is demonstrated how semantic code search can be used to reduce the time taken for qualitative evaluation, by automatically pairing a student submission with an instructor’s hand-written feedback. Finally, it is examined how deep learning can be used to predict variable names within source code. These models can be used in a qualitative evaluation setting where the deep learning models can be used to suggest more appropriate variable names. It is also shown that these models can even be used to predict the presence of functional errors. Novel experimental results show that: fine-tuning a pre-trained language model is an effective way to improve performance across a variety of tasks on source code, improving performance by 5% on average; pre-trained language models can be used as zero-shot learners across a variety of tasks, with the zero-shot performance of some architectures outperforming the fine-tuned performance of others; and that language models can be used to detect both semantic and syntactic errors. Other novel findings include: removing the non-variable tokens within source code has negligible impact on the performance of models, and that these remaining tokens can be shuffled with only a minimal decrease in performance.Engineering and Physical Sciences Research Council (EPSRC) fundin

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 29th European Symposium on Programming, ESOP 2020, which was planned to take place in Dublin, Ireland, in April 2020, as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The actual ETAPS 2020 meeting was postponed due to the Corona pandemic. The papers deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems

    Simulation Intelligence: Towards a New Generation of Scientific Methods

    Full text link
    The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science
    corecore