6,193 research outputs found
Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus
In Web search, entity-seeking queries often trigger a special Question
Answering (QA) system. It may use a parser to interpret the question to a
structured query, execute that on a knowledge graph (KG), and return direct
entity responses. QA systems based on precise parsing tend to be brittle: minor
syntax variations may dramatically change the response. Moreover, KG coverage
is patchy. At the other extreme, a large corpus may provide broader coverage,
but in an unstructured, unreliable form. We present AQQUCN, a QA system that
gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of
query syntax, between well-formed questions to short `telegraphic' keyword
sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals
from KGs and large corpora to directly rank KG entities, rather than commit to
one semantic interpretation of the query. AQQUCN models the ideal
interpretation as an unobservable or latent variable. Interpretations and
candidate entity responses are scored as pairs, by combining signals from
multiple convolutional networks that operate collectively on the query, KG and
corpus. On four public query workloads, amounting to over 8,000 queries with
diverse query syntax, we see 5--16% absolute improvement in mean average
precision (MAP), compared to the entity ranking performance of recent systems.
Our system is also competitive at entity set retrieval, almost doubling F1
scores for challenging short queries.Comment: Accepted to Information Retrieval Journa
Repairing Deep Neural Networks: Fix Patterns and Challenges
Significant interest in applying Deep Neural Network (DNN) has fueled the
need to support engineering of software that uses DNNs. Repairing software that
uses DNNs is one such unmistakable SE need where automated tools could be
beneficial; however, we do not fully understand challenges to repairing and
patterns that are utilized when manually repairing DNNs. What challenges should
automated repair tools address? What are the repair patterns whose automation
could help developers? Which repair patterns should be assigned a higher
priority for building automated bug repair tools? This work presents a
comprehensive study of bug fix patterns to address these questions. We have
studied 415 repairs from Stack overflow and 555 repairs from Github for five
popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to
understand challenges in repairs and bug repair patterns. Our key findings
reveal that DNN bug fix patterns are distinctive compared to traditional bug
fix patterns; the most common bug fix patterns are fixing data dimension and
neural network connectivity; DNN bug fixes have the potential to introduce
adversarial vulnerabilities; DNN bug fixes frequently introduce new bugs; and
DNN bug localization, reuse of trained model, and coping with frequent releases
are major challenges faced by developers when fixing bugs. We also contribute a
benchmark of 667 DNN (bug, repair) instances
Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow
For tasks like code synthesis from natural language, code retrieval, and code
summarization, data-driven models have shown great promise. However, creating
these models require parallel data between natural language (NL) and code with
fine-grained alignments. Stack Overflow (SO) is a promising source to create
such a data set: the questions are diverse and most of them have corresponding
answers with high-quality code snippets. However, existing heuristic methods
(e.g., pairing the title of a post with the code in the accepted answer) are
limited both in their coverage and the correctness of the NL-code pairs
obtained. In this paper, we propose a novel method to mine high-quality aligned
data from SO using two sets of features: hand-crafted features considering the
structure of the extracted snippets, and correspondence features obtained by
training a probabilistic model to capture the correlation between NL and code
using neural networks. These features are fed into a classifier that determines
the quality of mined NL-code pairs. Experiments using Python and Java as test
beds show that the proposed method greatly expands coverage and accuracy over
existing mining methods, even when using only a small number of labeled
examples. Further, we find that reasonable results are achieved even when
training the classifier on one language and testing on another, showing promise
for scaling NL-code mining to a wide variety of programming languages beyond
those for which we are able to annotate data.Comment: MSR '1
- …