306 research outputs found
Interactive, Intelligent Tutoring for Auxiliary Constructions in Geometry Proofs
Geometry theorem proving forms a major and challenging component in the K-12
mathematics curriculum. A particular difficult task is to add auxiliary
constructions (i.e, additional lines or points) to aid proof discovery.
Although there exist many intelligent tutoring systems proposed for geometry
proofs, few teach students how to find auxiliary constructions. And the few
exceptions are all limited by their underlying reasoning processes for
supporting auxiliary constructions. This paper tackles these weaknesses of
prior systems by introducing an interactive geometry tutor, the Advanced
Geometry Proof Tutor (AGPT). It leverages a recent automated geometry prover to
provide combined benefits that any geometry theorem prover or intelligent
tutoring system alone cannot accomplish. In particular, AGPT not only can
automatically process images of geometry problems directly, but also can
interactively train and guide students toward discovering auxiliary
constructions on their own. We have evaluated AGPT via a pilot study with 78
high school students. The study results show that, on training students how to
find auxiliary constructions, there is no significant perceived difference
between AGPT and human tutors, and AGPT is significantly more effective than
the state-of-the-art geometry solver that produces human-readable proofs.Comment: 10 page
Achieving High Coverage for Floating-point Code via Unconstrained Programming (Extended Version)
Achieving high code coverage is essential in testing, which gives us
confidence in code quality. Testing floating-point code usually requires
painstaking efforts in handling floating-point constraints, e.g., in symbolic
execution. This paper turns the challenge of testing floating-point code into
the opportunity of applying unconstrained programming --- the mathematical
solution for calculating function minimum points over the entire search space.
Our core insight is to derive a representing function from the floating-point
program, any of whose minimum points is a test input guaranteed to exercise a
new branch of the tested program. This guarantee allows us to achieve high
coverage of the floating-point program by repeatedly minimizing the
representing function.
We have realized this approach in a tool called CoverMe and conducted an
extensive evaluation of it on Sun's C math library. Our evaluation results show
that CoverMe achieves, on average, 90.8% branch coverage in 6.9 seconds,
drastically outperforming our compared tools: (1) Random testing, (2) AFL, a
highly optimized, robust fuzzer released by Google, and (3) Austin, a
state-of-the-art coverage-based testing tool designed to support floating-point
code.Comment: Extended version of Fu and Su's PLDI'17 paper. arXiv admin note: text
overlap with arXiv:1610.0113
Toward Rapid Transformation of Ideas into Software
A key mission of computer science is to enable people realize their creative
ideas as naturally and painlessly as possible. Software engineering is at the
center of this mission -- software technologies enable reification of ideas
into working systems. As computers become ubiquitous, both in availability and
the aspects of human lives they touch, the quantity and diversity of ideas also
rapidly grow. Our programming systems and technologies need to evolve to make
this reification process -- transforming ideas to software -- as quick and
accessible as possible.
The goal of this paper is twofold. First, it advocates and highlights the
"transforming ideas to software" mission as a moonshot for software engineering
research. This is a long-term direction for the community, and there is no
silver bullet that can get us there. To make this mission a reality, as a
community, we need to improve the status quo across many dimensions. Thus, the
second goal is to outline a number of directions to modernize our contemporary
programming technologies for decades to come, describe work that has been
undertaken along those vectors, and pinpoint critical challenges
Data-Driven Feedback Generation for Introductory Programming Exercises
This paper introduces the "Search, Align, and Repair" data-driven program
repair framework to automate feedback generation for introductory programming
exercises. Distinct from existing techniques, our goal is to develop an
efficient, fully automated, and problem-agnostic technique for large or
MOOC-scale introductory programming courses. We leverage the large amount of
available student submissions in such settings and develop new algorithms for
identifying similar programs, aligning correct and incorrect programs, and
repairing incorrect programs by finding minimal fixes. We have implemented our
technique in the SARFGEN system and evaluated it on thousands of real student
attempts from the Microsoft-DEV204.1X edX course and the Microsoft CodeHunt
platform. Our results show that SARFGEN can, within two seconds on average,
generate concise, useful feedback for 89.7% of the incorrect student
submissions. It has been integrated with the Microsoft-DEV204.1X edX class and
deployed for production use.Comment: 12 page
Dynamic Neural Program Embedding for Program Repair
Neural program embeddings have shown much promise recently for a variety of
program analysis tasks, including program synthesis, program repair, fault
localization, etc. However, most existing program embeddings are based on
syntactic features of programs, such as raw token sequences or abstract syntax
trees. Unlike images and text, a program has an unambiguous semantic meaning
that can be difficult to capture by only considering its syntax (i.e.
syntactically similar pro- grams can exhibit vastly different run-time
behavior), which makes syntax-based program embeddings fundamentally limited.
This paper proposes a novel semantic program embedding that is learned from
program execution traces. Our key insight is that program states expressed as
sequential tuples of live variable values not only captures program semantics
more precisely, but also offer a more natural fit for Recurrent Neural Networks
to model. We evaluate different syntactic and semantic program embeddings on
predicting the types of errors that students make in their submissions to an
introductory programming class and two exercises on the CodeHunt education
platform. Evaluation results show that our new semantic program embedding
significantly outperforms the syntactic program embeddings based on token
sequences and abstract syntax trees. In addition, we augment a search-based
program repair system with the predictions obtained from our se- mantic
embedding, and show that search efficiency is also significantly improved.Comment: 9 page
Abstracting Runtime Heaps for Program Understanding
Modern programming environments provide extensive support for inspecting,
analyzing, and testing programs based on the algorithmic structure of a
program. Unfortunately, support for inspecting and understanding runtime data
structures during execution is typically much more limited. This paper provides
a general purpose technique for abstracting and summarizing entire runtime
heaps. We describe the abstract heap model and the associated algorithms for
transforming a concrete heap dump into the corresponding abstract model as well
as algorithms for merging, comparing, and computing changes between abstract
models. The abstract model is designed to emphasize high-level concepts about
heap-based data structures, such as shape and size, as well as relationships
between heap structures, such as sharing and connectivity. We demonstrate the
utility and computational tractability of the abstract heap model by building a
memory profiler. We then use this tool to check for, pinpoint, and correct
sources of memory bloat from a suite of programs from DaCapo
Metamorphic Testing for Object Detection Systems
Recent advances in deep neural networks (DNNs) have led to object detectors
that can rapidly process pictures or videos, and recognize the objects that
they contain. Despite the promising progress by industrial manufacturers such
as Amazon and Google in commercializing deep learning-based object detection as
a standard computer vision service, object detection systems - similar to
traditional software - may still produce incorrect results. These errors, in
turn, can lead to severe negative outcomes for the users of these object
detection systems. For instance, an autonomous driving system that fails to
detect pedestrians can cause accidents or even fatalities. However, principled,
systematic methods for testing object detection systems do not yet exist,
despite their importance.
To fill this critical gap, we introduce the design and realization of MetaOD,
the first metamorphic testing system for object detectors to effectively reveal
erroneous detection results by commercial object detectors. To this end, we (1)
synthesize natural-looking images by inserting extra object instances into
background images, and (2) design metamorphic conditions asserting the
equivalence of object detection results between the original and synthetic
images after excluding the prediction results on the inserted objects. MetaOD
is designed as a streamlined workflow that performs object extraction,
selection, and insertion. Evaluated on four commercial object detection
services and four pretrained models provided by the TensorFlow API, MetaOD
found tens of thousands of detection defects in these object detectors. To
further demonstrate the practical usage of MetaOD, we use the synthetic images
that cause erroneous detection results to retrain the model. Our results show
that the model performance is increased significantly, from an mAP score of 9.3
to an mAP score of 10.5
ShapeFlow: Dynamic Shape Interpreter for TensorFlow
We present ShapeFlow, a dynamic abstract interpreter for TensorFlow which
quickly catches tensor shape incompatibility errors, one of the most common
bugs in deep learning code. ShapeFlow shares the same APIs as TensorFlow but
only captures and emits tensor shapes, its abstract domain. ShapeFlow
constructs a custom shape computational graph, similar to the computational
graph used by TensorFlow. ShapeFlow requires no code annotation or code
modification by the programmer, and therefore is convenient to use. We evaluate
ShapeFlow on 52 programs collected by prior empirical studies to show how fast
and accurately it can catch shape incompatibility errors compared to
TensorFlow. We use two baselines: a worst-case training dataset size and a more
realistic dataset size. ShapeFlow detects shape incompatibility errors highly
accurately -- with no false positives and a single false negative -- and highly
efficiently -- with an average speed-up of 499X and 24X for the first and
second baseline, respectively. We believe ShapeFlow is a practical tool that
benefits machine learning developers. We will open-source ShapeFlow on GitHub
to make it publicly available to both the developer and research communities.Comment: 14 pages, 9 figures. Work done about one and half year before the
submission to Arxi
Learning Blended, Precise Semantic Program Embeddings
Learning neural program embeddings is key to utilizing deep neural networks
in program languages research --- precise and efficient program representations
enable the application of deep models to a wide range of program analysis
tasks. Existing approaches predominately learn to embed programs from their
source code, and, as a result, they do not capture deep, precise program
semantics. On the other hand, models learned from runtime information
critically depend on the quality of program executions, thus leading to trained
models with highly variant quality. This paper tackles these inherent
weaknesses of prior approaches by introducing a new deep neural network,
\liger, which learns program representations from a mixture of symbolic and
concrete execution traces. We have evaluated \liger on \coset, a recently
proposed benchmark suite for evaluating neural program embeddings. Results show
\liger (1) is significantly more accurate than the state-of-the-art
syntax-based models Gated Graph Neural Network and code2vec in classifying
program semantics, and (2) requires on average 10x fewer executions covering
74\% fewer paths than the state-of-the-art dynamic model \dypro. Furthermore,
we extend \liger to predict the name for a method from its body's vector
representation. Learning on the same set of functions (more than 170K in
total), \liger significantly outperforms code2seq, the previous
state-of-the-art for method name prediction
Testing Database Engines via Pivoted Query Synthesis
Relational databases are used ubiquitously. They are managed by database
management systems (DBMS), which allow inserting, modifying, and querying data
using a domain-specific language called Structured Query Language (SQL).
Popular DBMS have been extensively tested by fuzzers, which have been
successful in finding crash bugs. However, approaches to finding logic bugs,
such as when a DBMS computes an incorrect result set, have remained mostly
untackled. Differential testing is an effective technique to test systems that
support a common language by comparing the outputs of these systems. However,
this technique is ineffective for DBMS, because each DBMS typically supports
its own SQL dialect. To this end, we devised a novel and general approach that
we have termed Pivoted Query Synthesis. The core idea of this approach is to
automatically generate queries for which we ensure that they fetch a specific,
randomly selected row, called the pivot row. If the DBMS fails to fetch the
pivot row, the likely cause is a bug in the DBMS. We tested our approach on
three widely-used and mature DBMS, namely SQLite, MySQL, and PostgreSQL. In
total, we reported 123 bugs in these DBMS, 99 of which have been fixed or
verified, demonstrating that the approach is highly effective and general. We
expect that the wide applicability and simplicity of our approach will enable
the improvement of robustness of many DBMS
- …