7,138 research outputs found

    Identification of Programmers from Typing Patterns

    Get PDF
    Being able to identify the user of a computer solely based on their typing patterns can lead to improvements in plagiarism detection, provide new opportunities for authentication, and enable novel guidance methods in tutoring systems. However, at the same time, if such identification is possible, new privacy and ethical concerns arise. In our work, we explore methods for identifying individuals from typing data captured by a programming environment as these individuals are learning to program. We compare the identification accuracy of automatically generated user profiles, ranging from the average amount of time that a user needs between keystrokes to the amount of time that it takes for the user to press specific pairs of keys, digraphs. We also explore the effect of data quantity and different acceptance thresholds on the identification accuracy, and analyze how the accuracy changes when identifying individuals across courses. Our results show that, while the identification accuracy varies depending on data quantity and the method, identification of users based on their programming data is possible. These results indicate that there is potential in using this method, for example, in identification of students taking exams, and that such data has privacy concerns that should be addressed.Peer reviewe

    Typing Patterns and Authentication in Practical Programming Exams

    Get PDF
    In traditional programming courses, students have usually been at least partly graded using pen and paper exams. One of the problems related to such exams is that they only partially connect to the practice conducted within such courses. Testing students in a more practical environment has been constrained due to the limited resources that are needed, for example, for authentication. In this work, we study whether students in a programming course can be identified in an exam setting based solely on their typing patterns. We replicate an earlier study that indicated that keystroke analysis can be used for identifying programmers. Then, we examine how a controlled machine examination setting affects the identification accuracy, i.e. if students can be identified reliably in a machine exam based on typing profiles built with data from students' programming assignments from a course. Finally, we investigate the identification accuracy in an uncontrolled machine exam, where students can complete the exam at any time using any computer they want. Our results indicate that even though the identification accuracy deteriorates when identifying students in an exam, the accuracy is high enough to reliably identify students if the identification is not required to be exact, but top k closest matches are regarded as correct.Peer reviewe

    Privacy versus Information in Keystroke Latency Data

    Get PDF
    The computer science education research field studies how students learn computer science related concepts such as programming and algorithms. One of the major goals of the field is to help students learn CS concepts that are often difficult to grasp because students rarely encounter them in primary or secondary education. In order to help struggling students, information on the learning process of students has to be collected. In many introductory programming courses process data is automatically collected in the form of source code snapshots. Source code snapshots usually include at least the source code of the student's program and a timestamp. Studies ranging from identifying at-risk students to inferring programming experience and topic knowledge have been conducted using source code snapshots. However, replicating source code snapshot -based studies is currently hard as data is rarely shared due to privacy concerns. Source code snapshot data often includes many attributes that can be used for identification, for example the name of the student or the student number. There can even be hidden identifiers in the data that can be used for identification even if obvious identifiers are removed. For example, keystroke data from source code snapshots can be used for identification based on the distinct typing profiles of students. Hence, simply removing explicit identifiers such as names and student numbers is not enough to protect the privacy of the users who have supplied the data. At the same time, removing all keystroke data would decrease the value of the data significantly and possibly preclude replication studies. In this work, we investigate how keystroke data from a programming context could be modified to prevent keystroke latency -based identification whilst still retaining valuable information in the data. This study is the first step in enabling the sharing of anonymized source code snapshots. We investigate the degree of anonymization required to make identification of students based on their typing patterns unreliable. Then, we study whether the modified keystroke data can still be used to infer the programming experience of the students as a case study of whether the anonymized typing patterns have retained at least some informative value. We show that it is possible to modify data so that keystroke latency -based identification is no longer accurate, but the programming experience of the students can still be inferred, i.e. the data still has value to researchers

    Preventing Keystroke Based Identification in Open Data Sets

    Get PDF
    Large-scale courses such as Massive Online Open Courses (MOOCs) can be a great data source for researchers. Ideally, the data gathered on such courses should be openly available to all researchers. Studies could be easily replicated and novel studies on existing data could be conducted. However, very fine-grained data such as source code snapshots can contain hidden identifiers. For example, distinct typing patterns that identify individuals can be extracted from such data. Hence, simply removing explicit identifiers such as names and student numbers is not sufficient to protect the privacy of the users who have supplied the data. At the same time, removing all keystroke information would decrease the value of the shared data significantly. In this work, we study how keystroke data from a programming context could be modified to prevent keystroke latency based identification whilst still retaining information that can be used to e.g. infer programming experience. We investigate the degree of anonymization required to render identification of students based on their typing patterns unreliable. Then, we study whether the modified keystroke data can still be used to infer the programming experience of the students as a case study of whether the anonymized typing patterns have retained at least some informative value. We show that it is possible to modify data so that keystroke latency based identification is no longer accurate, but the programming experience of the students can still be inferred, i.e. the data still has value to researchers. In a broader context, our results indicate that information and anonymity are not necessarily mutually exclusive.Peer reviewe

    The Babbage principle after evolutionary economics

    Get PDF
    In this paper we analyse the cognitive roots of the division of labour and relate it to the reduction of tacitness in the organisation and technology of a firm. We study the interaction between efforts of knowledge codification and problems of control in production from an evolutionary and complex systems perspective. By applying our framework to the emergence of white-collar work in the late 19th century and the modern knowledge economy we assert that property rights and limits to codification of knowledge are important forces shaping the process of organisational and technological change.research and development ;

    Gradual Liquid Type Inference

    Full text link
    Liquid typing provides a decidable refinement inference mechanism that is convenient but subject to two major issues: (1) inference is global and requires top-level annotations, making it unsuitable for inference of modular code components and prohibiting its applicability to library code, and (2) inference failure results in obscure error messages. These difficulties seriously hamper the migration of existing code to use refinements. This paper shows that gradual liquid type inference---a novel combination of liquid inference and gradual refinement types---addresses both issues. Gradual refinement types, which support imprecise predicates that are optimistically interpreted, can be used in argument positions to constrain liquid inference so that the global inference process e effectively infers modular specifications usable for library components. Dually, when gradual refinements appear as the result of inference, they signal an inconsistency in the use of static refinements. Because liquid refinements are drawn from a nite set of predicates, in gradual liquid type inference we can enumerate the safe concretizations of each imprecise refinement, i.e. the static refinements that justify why a program is gradually well-typed. This enumeration is useful for static liquid type error explanation, since the safe concretizations exhibit all the potential inconsistencies that lead to static type errors. We develop the theory of gradual liquid type inference and explore its pragmatics in the setting of Liquid Haskell.Comment: To appear at OOPSLA 201

    Automatic Inference of Programming Performance and Experience from Typing Patterns

    Get PDF
    Studies on retention and success in introductory programming course have suggested that previous programming experience contributes to students' course outcomes. If such background information could be automatically distilled from students' working process, additional guidance and support mechanisms could be provided even to those, who do not wish to disclose such information. In this study, we explore methods for automatically distinguishing novice programmers from more experienced programmers using fine-grained source code snapshot data. We approach the issue by partially replicating a previous study that used students' keystroke latencies as a proxy to introductory programming course outcomes, and follow this by an exploration of machine learning methods to separate those students with little to no previous programming experience from those with more experience. Our results confirm that students' keystroke latencies can be used as a metric for measuring course outcomes. At the same time, our results show that students programming experience can be identified to some extent from keystroke latency data, which means that such data has potential as a source of information for customizing the students' learning experience.Peer reviewe
    • …
    corecore