978 research outputs found
CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning
To accelerate software development, much research has been performed to help
people understand and reuse the huge amount of available code resources. Two
important tasks have been widely studied: code retrieval, which aims to
retrieve code snippets relevant to a given natural language query from a code
base, and code annotation, where the goal is to annotate a code snippet with a
natural language description. Despite their advancement in recent years, the
two tasks are mostly explored separately. In this work, we investigate a novel
perspective of Code annotation for Code retrieval (hence called `CoaCor'),
where a code annotation model is trained to generate a natural language
annotation that can represent the semantic meaning of a given code snippet and
can be leveraged by a code retrieval model to better distinguish relevant code
snippets from others. To this end, we propose an effective framework based on
reinforcement learning, which explicitly encourages the code annotation model
to generate annotations that can be used for the retrieval task. Through
extensive experiments, we show that code annotations generated by our framework
are much more detailed and more useful for code retrieval, and they can further
improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201
A lightweight web video model with content and context descriptions for integration with linked data
The rapid increase of video data on the Web has warranted an urgent need for effective representation, management and retrieval of web videos. Recently, many studies have been carried out for ontological representation of videos, either using domain dependent or generic schemas such as MPEG-7, MPEG-4, and COMM. In spite of their extensive coverage and sound theoretical grounding, they are yet to be widely used by users. Two main possible reasons are the complexities involved and a lack of tool support. We propose a lightweight video content model for content-context description and integration. The uniqueness of the model is that it tries to model the emerging social context to describe and interpret the video. Our approach is grounded on exploiting easily extractable evolving contextual metadata and on the availability of existing data on the Web. This enables representational homogeneity and a firm basis for information integration among semantically-enabled data sources. The model uses many existing schemas to describe various ontology classes and shows the scope of interlinking with the Linked Data cloud
Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus
In Web search, entity-seeking queries often trigger a special Question
Answering (QA) system. It may use a parser to interpret the question to a
structured query, execute that on a knowledge graph (KG), and return direct
entity responses. QA systems based on precise parsing tend to be brittle: minor
syntax variations may dramatically change the response. Moreover, KG coverage
is patchy. At the other extreme, a large corpus may provide broader coverage,
but in an unstructured, unreliable form. We present AQQUCN, a QA system that
gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of
query syntax, between well-formed questions to short `telegraphic' keyword
sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals
from KGs and large corpora to directly rank KG entities, rather than commit to
one semantic interpretation of the query. AQQUCN models the ideal
interpretation as an unobservable or latent variable. Interpretations and
candidate entity responses are scored as pairs, by combining signals from
multiple convolutional networks that operate collectively on the query, KG and
corpus. On four public query workloads, amounting to over 8,000 queries with
diverse query syntax, we see 5--16% absolute improvement in mean average
precision (MAP), compared to the entity ranking performance of recent systems.
Our system is also competitive at entity set retrieval, almost doubling F1
scores for challenging short queries.Comment: Accepted to Information Retrieval Journa
- …