7,688 research outputs found
Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts
Most of the JavaScript code deployed in the wild has been minified, a process
in which identifier names are replaced with short, arbitrary and meaningless
names. Minified code occupies less space, but also makes the code extremely
difficult to manually inspect and understand. This paper presents Context2Name,
a deep learningbased technique that partially reverses the effect of
minification by predicting natural identifier names for minified names. The
core idea is to predict from the usage context of a variable a name that
captures the meaning of the variable. The approach combines a lightweight,
token-based static analysis with an auto-encoder neural network that summarizes
usage contexts and a recurrent neural network that predict natural names for a
given usage context. We evaluate Context2Name with a large corpus of real-world
JavaScript code and show that it successfully predicts 47.5% of all minified
identifiers while taking only 2.9 milliseconds on average to predict a name. A
comparison with the state-of-the-art tools JSNice and JSNaughty shows that our
approach performs comparably in terms of accuracy while improving in terms of
efficiency. Moreover, Context2Name complements the state-of-the-art by
predicting 5.3% additional identifiers that are missed by both existing tools
SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering
Version information plays an important role in spreadsheet understanding,
maintaining and quality improving. However, end users rarely use version
control tools to document spreadsheet version information. Thus, the
spreadsheet version information is missing, and different versions of a
spreadsheet coexist as individual and similar spreadsheets. Existing approaches
try to recover spreadsheet version information through clustering these similar
spreadsheets based on spreadsheet filenames or related email conversation.
However, the applicability and accuracy of existing clustering approaches are
limited due to the necessary information (e.g., filenames and email
conversation) is usually missing. We inspected the versioned spreadsheets in
VEnron, which is extracted from the Enron Corporation. In VEnron, the different
versions of a spreadsheet are clustered into an evolution group. We observed
that the versioned spreadsheets in each evolution group exhibit certain common
features (e.g., similar table headers and worksheet names). Based on this
observation, we proposed an automatic clustering algorithm, SpreadCluster.
SpreadCluster learns the criteria of features from the versioned spreadsheets
in VEnron, and then automatically clusters spreadsheets with the similar
features into the same evolution group. We applied SpreadCluster on all
spreadsheets in the Enron corpus. The evaluation result shows that
SpreadCluster could cluster spreadsheets with higher precision and recall rate
than the filename-based approach used by VEnron. Based on the clustering result
by SpreadCluster, we further created a new versioned spreadsheet corpus
VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the
other two spreadsheet corpora FUSE and EUSES. The results show that
SpreadCluster can cluster the versioned spreadsheets in these two corpora with
high precision.Comment: 12 pages, MSR 201
Compositional Semantic Parsing on Semi-Structured Tables
Two important aspects of semantic parsing for question answering are the
breadth of the knowledge source and the depth of logical compositionality.
While existing work trades off one aspect for another, this paper
simultaneously makes progress on both fronts through a new task: answering
complex questions on semi-structured tables using question-answer pairs as
supervision. The central challenge arises from two compounding factors: the
broader domain results in an open-ended set of relations, and the deeper
compositionality results in a combinatorial explosion in the space of logical
forms. We propose a logical-form driven parsing algorithm guided by strong
typing constraints and show that it obtains significant improvements over
natural baselines. For evaluation, we created a new dataset of 22,033 complex
questions on Wikipedia tables, which is made publicly available
- …