19 research outputs found

    Simplifying Deep-Learning-Based Model for Code Search

    Full text link
    To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR) based models for code search, which match keywords in query with code text. But they fail to connect the semantic gap between query and code. To conquer this challenge, Gu et al. proposed a deep-learning-based model named DeepCS. It jointly embeds method code and natural language description into a shared vector space, where methods related to a natural language query are retrieved according to their vector similarities. However, DeepCS' working process is complicated and time-consuming. To overcome this issue, we proposed a simplified model CodeMatcher that leverages the IR technique but maintains many features in DeepCS. Generally, CodeMatcher combines query keywords with the original order, performs a fuzzy search on name and body strings of methods, and returned the best-matched methods with the longer sequence of used keywords. We verified its effectiveness on a large-scale codebase with about 41k repositories. Experimental results showed the simplified model CodeMatcher outperforms DeepCS by 97% in terms of MRR (a widely used accuracy measure for code search), and it is over 66 times faster than DeepCS. Besides, comparing with the state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by 73%. We also observed that: fusing the advantages of IR-based and deep-learning-based models is promising because they compensate with each other by nature; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code

    Exploring Problem Solving Paths in a Java Programming Course

    Get PDF
    Assessment of students’ programming submissions has been the focus of interest in many studies. Although the final submissions capture the whole program, they often tell very little about how it was developed. In this paper, we are able to look at intermediate programming steps using a unique dataset that captures a series of snapshots showing how students developed their program over time. We assessed each of these intermediate steps and performed a fine-grained concept-based analysis on each step to identify the most common programming paths. Analysis of results showed that most of the students tend to incrementally build the program and improve its correctness. This finding provides us with evidence that intermediate programming steps are important, and need to be taken into account for not only improving user modelling in educational programming systems, but also for providing better feedback to students

    Investigating Automated Student Modeling in a Java MOOC

    Get PDF
    With the advent of ubiquitous web, programming is no longer a sole prerogative of computer science schools. Scripting languages are taught to wider audiences and programming has become a flag post of any technology related program. As more and more students are exposed to coding, it is no longer a trade of the select few. As a result, students who would not opt for a coding class a decade ago are in a position of having to learn a rather difficult subject. The problem of assisting students in learning programming has been explored in several intelligent tutoring systems. The key component of such systems is a student model that keeps track of student progress. In turn, the foundation of a student model is a domain model – a vocabulary of skills (or concepts) that structures the representation of student knowledge. Building domain models for programming is known as a complicated task. In this paper we explore automated approaches for extracting domain models for learning programming languages and modeling student knowledge in the process of solving programming exercises. We evaluate the validity of this approach using large volume of student code submission data from a MOOC on introductory Java programming

    Off the beaten path: The impact of adaptive content sequencing on student navigation in an open social student modeling interface

    Get PDF
    One of the original goals of intelligent educational systems is to guide every student to the most appropriate educational content. Exploring both knowledge-based and social guidance approaches in past work, we learned that each of these approaches has weak sides. In this paper we follow the idea of combining social guidance with more traditional knowledge-based guidance to support more optimal content navigation. We proposed a greedy sequencing approach that maximizes student’s level of knowledge and tested it in a classroom. Results indicated that this approach positively impacts students’ navigation

    Graph analysis of student model networks

    Get PDF
    This paper explores the feasibility of a graph-based approach to model student knowledge in the domain of programming. The key idea of this approach is that programming concepts are truly learned not in isolation, but rather in combination with other concepts. Following this idea, we represent a student model as a graph where links are gradually added when the student's ability to work with connected pairs of concepts in the same context is confirmed. We also hypothesize that with this graph-based approach a number of traditional graph metrics could be used to better measure student knowledge than using more traditional scalar models of student knowledge. To collect some early evidence in favor of this idea, we used data from several classroom studies to correlate graph metrics with various performance and motivation metrics

    HyperAST: Enabling Efficient Analysis of Software Histories at Scale

    Get PDF
    International audienceSyntax Trees (ASTs) are widely used beyond compilers in many tools that measure and improve code quality, such as code analysis, bug detection, mining code metrics, refactoring. With the advent of fast software evolution and multistage releases, the temporal analysis of an AST history is becoming useful to understand and maintain code. However, jointly analyzing thousands versions of ASTs independently faces scalability issues, mostly combinatorial, both in terms of memory and CPU usage. In this paper, we propose a novel type of AST, called HyperAST , that enables efficient temporal code analysis on a given software history by: 1/ leveraging code redundancy through space (between code elements) and time (between versions); 2/ reusing intermediate computation results. We show how the HyperAST can be built incrementally on a set of commits to capture all multiple ASTs at once in an optimized way. We evaluated the HyperAST on a curated list of large software projects. Compared to Spoon, a state-of-the-art technique, we observed that the HyperAST outperforms it with an order-of-magnitude difference from Ă—6 up to Ă—8076 in CPU construction time and from Ă—12 up to Ă—1159 in memory footprint. While the HyperAST requires up to 2 h 22 min and 7.2 GB for the biggest project, Spoon requires up to 93 h and 31 min and 2.2 TB. The gains in construction time varied from 83.4 % to 99.99 % and the gains in memory footprint varied from 91.8 % to 99.9 %. We further compared the task of finding references of declarations with the HyperAST and Spoon. We observed on average 90 % precision and 97 % recall without a significant difference in search time

    A comparative study of visual cues for annotation-based navigation support in adaptive educational hypermedia

    Get PDF
    Adaptive link annotation is one of the most well-known adaptive navigation support technologies that aims to guide hypermedia users to the most relevant information by personalizing the appearance of hyperlinks. Past work assumed no difference between different interface implementations of personalization approaches that are conceptually the same. The goal of the current study was to determine whether the choice of visual cues does matter by conducting a user study with several alternative designs for link annotation in interactive code examples

    General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge

    Get PDF
    Knowledge Tracing is the de-facto standard for inferring student knowledge from performance data. Unfortunately, it does not allow modeling the feature-rich data that is now possible to collect in modern digital learning environments. Because of this, many ad hoc Knowledge Tracing variants have been proposed to model a specific feature of interest. For example, variants have studied the effect of students’ individual characteristics, the effect of help in a tutor, and subskills. These ad hoc models are successful for their own specific purpose, but are specified to only model a single specific feature. We present FAST (Feature Aware Student knowledge Tracing), an efficient, novel method that allows integrating general features into Knowledge Tracing. We demonstrate FAST’s flexibility with three examples of feature sets that are relevant to a wide audience. We use features in FAST to model (i) multiple subskill tracing, (ii) a temporal Item Response Model implementation, and (iii) expert knowledge. We present empirical results using data collected from an Intelligent Tutoring System. We report that using features can improve up to 25% in classification performance of the task of predicting student performance. Moreover, for fitting and inferencing, FAST can be 300 times faster than models created in BNT-SM, a toolkit that facilitates the creation of ad hoc Knowledge Tracing variants

    Navigation support in complex open learner models: assessing visual design alternatives

    Get PDF
    Open Learner Models are used in modern e-learning to show system users the content of their learner models. This approach is known to prompt reflection, facilitate planning and navigation. Open Learner Models may show different levels of detail of the underlying learner model, and may structure the information differently. However, a trade-off exists between useful information and the complexity of the information. This paper investigates whether offering richer information is assessed positively by learners and results in more effective support for learning tasks. An interview pre-study revealed which information within the complex learner model is of interest. A controlled user study examined six alternative visualisation prototypes of varying complexity and resulted in the implementation of one of the designs. A second controlled study involved students interacting with variations of the visualisation while searching for suitable learning material, and revealed the value of the design alternative and its variations. The work contributes to developing complex open learner models by stressing the need to balance complexity and support. It also suggests that the expressiveness of open learner models can be improved with visual elements that strategically summarise the complex information being displayed in detail
    corecore