1,629 research outputs found

    Easy over Hard: A Case Study on Deep Learning

    Full text link
    While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost. This is particularly important for deep learning since these learners need hours (to weeks) to train the model. Such long training time limits the ability of (a)~a researcher to test the stability of their conclusion via repeated runs with different random seeds; and (b)~other researchers to repeat, improve, or even refute that original work. For example, recently, deep learning was used to find which questions in the Stack Overflow programmer discussion forum can be linked together. That deep learning system took 14 hours to execute. We show here that applying a very simple optimizer called DE to fine tune SVM, it can achieve similar (and sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84 times faster hours than deep learning method. We offer these results as a cautionary tale to the software analytics community and suggest that not every new innovation should be applied without critical analysis. If researchers deploy some new and expensive process, that work should be baselined against some simpler and faster alternatives.Comment: 12 pages, 6 figures, accepted at FSE201

    On Oracles for Automated Diagnosis and Repair of Software Bugs

    Get PDF
    This HDR focuses on my work on automatic diagnosis and repair done over the past years. Among my past publications, it highlights three contributions on this topic, respectively published in ACM Transactions on Software Engineering and Methodology (TOSEM), IEEE Transactions on Software Engineering (TSE) and Elsevier Information & Software Technology (IST). My goal is to show that those three contributions share something deep, that they are founded on a unifying concept, which is the one of oracle. The first contribution is about statistical oracles. In the context of object-oriented software, we have defined a notion of context and normality that is specific to a fault class: missing method calls. Those inferred regularities act as oracle and their violations are considered as bugs. The second contribution is about test case based oracles for automatic repair. We describe an automatic repair system that fixes failing test cases by generating a patch. It is founded on the idea of refining the knowledge given by the violation of the oracle of the failing test case into finer-grain information, which we call a “micro-oracle”. By considering micro-oracles, we are capable of obtaining at the same time a precise fault localization diagnostic and a well-formed input-output specification to be used for program synthesis in order to repair a bug. The third contribution discusses a novel generic oracle in the context of exception handling. A generic oracle states properties that hold for many domains. Our technique verifies the compliance to this new oracle using test suite execution and exception injection. This document concludes with a research agenda about the future of engineering ultra-dependable and antifragile software systems

    Eye movements in code reading:relaxing the linear order

    Get PDF
    Abstract—Code reading is an important skill in programming. Inspired by the linearity that people exhibit while natural lan-guage text reading, we designed local and global gaze-based mea-sures to characterize linearity (left-to-right and top-to-bottom) in reading source code. Unlike natural language text, source code is executable and requires a specific reading approach. To validate these measures, we compared the eye movements of novice and expert programmers who were asked to read and comprehend short snippets of natural language text and Java programs. Our results show that novices read source code less linearly than natural language text. Moreover, experts read code less linearly than novices. These findings indicate that there are specific differences between reading natural language and source code, and suggest that non-linear reading skills increase with expertise. We discuss the implications for practitioners and educators. I

    Vertically integrated analysis and transformation for embedded software

    Get PDF
    Journal ArticleProgram analyses and transformations that are more aggressive and more domain-specific than those traditionally performed by compilers are one possible route to achieving the rapid creation of reliable and efficient embedded software. We are creating a new framework for Vertically Integrated Program Analysis (VIPA) that makes use of information gathered at multiple levels of abstraction such as high-level models, source code, and assembly language. This paper describes our approach and shows how and why it will help create better embedded software

    Creating Systems and Applying Large-Scale Methods to Improve Student Remediation in Online Tutoring Systems in Real-time and at Scale

    Get PDF
    A common problem shared amongst online tutoring systems is the time-consuming nature of content creation. It has been estimated that an hour of online instruction can take up to 100-300 hours to create. Several systems have created tools to expedite content creation, such as the Cognitive Tutors Authoring Tool (CTAT) and the ASSISTments builder. Although these tools make content creation more efficient, they all still depend on the efforts of a content creator and/or past historical. These tools do not take full advantage of the power of the crowd. These issues and challenges faced by online tutoring systems provide an ideal environment to implement a solution using crowdsourcing. I created the PeerASSIST system to provide a solution to the challenges faced with tutoring content creation. PeerASSIST crowdsources the work students have done on problems inside the ASSISTments online tutoring system and redistributes that work as a form of tutoring to their peers, who are in need of assistance. Multi-objective multi-armed bandit algorithms are used to distribute student work, which balance exploring which work is good and exploiting the best currently known work. These policies are customized to run in a real-world environment with multiple asynchronous reward functions and an infinite number of actions. Inspired by major companies such as Google, Facebook, and Bing, PeerASSIST is also designed as a platform for simultaneous online experimentation in real-time and at scale. Currently over 600 teachers (grades K-12) are requiring students to show their work. Over 300,000 instances of student work have been collected from over 18,000 students across 28,000 problems. From the student work collected, 2,000 instances have been redistributed to over 550 students who needed help over the past few months. I conducted a randomized controlled experiment to evaluate the effectiveness of PeerASSIST on student performance. Other contributions include representing learning maps as Bayesian networks to model student performance, creating a machine-learning algorithm to derive student incorrect processes from their incorrect answer and the inputs of the problem, and applying Bayesian hypothesis testing to A/B experiments. We showed that learning maps can be simplified without practical loss of accuracy and that time series data is necessary to simplify learning maps if the static data is highly correlated. I also created several interventions to evaluate the effectiveness of the buggy messages generated from the machine-learned incorrect processes. The null results of these experiments demonstrate the difficulty of creating a successful tutoring and suggest that other methods of tutoring content creation (i.e. PeerASSIST) should be explored

    Online Defect Prediction for Imbalanced Data

    Get PDF
    Many defect prediction techniques are proposed to improve software reliability. Change classification predicts defects at the change level, where a change is a collection of the modifications to one file in a commit. In this thesis, we conduct the first study of applying change classification in practice and share the lessons we learned. We identify two issues in the prediction process, both of which contribute to the low prediction performance. First, the data are imbalanced—there are much fewer buggy changes than clean changes. Second, the commonly used cross-validation approach is inappropriate for evaluating the performance of change classification. To address these challenges, we apply and adapt online change classification to evaluate the prediction and use resampling, updatable classification techniques as well as remove the testing-related changes to improve the classification performance. We perform the improved change classification techniques on one proprietary and six open source projects. Our results show that resampling and updatable classification techniques improve the precision of change classification by 12.2–89.5% or 6.4–34.8 percentage points (pp.) on the seven projects. Additionally, removing testing-related changes improves F1 by 62.2–3411.1% or 19.4–61.4 pp. on the six open source projects with a comparable value of precision achieved. Furthermore, we integrate change classification in the development process of the proprietary project. We have learned the following lessons: 1 ) new solutions are needed to convince developers to use and believe prediction results, and prediction results need to be actionable, 2 ) new and improved classification algorithms are needed to explain the prediction results, and insensible and unactionable explanations need to be filtered or refined, and 3 ) new techniques are needed to improve the relatively low precision
    • …
    corecore