7,314 research outputs found
Generating Synthetic Data for Neural Keyword-to-Question Models
Search typically relies on keyword queries, but these are often semantically
ambiguous. We propose to overcome this by offering users natural language
questions, based on their keyword queries, to disambiguate their intent. This
keyword-to-question task may be addressed using neural machine translation
techniques. Neural translation models, however, require massive amounts of
training data (keyword-question pairs), which is unavailable for this task. The
main idea of this paper is to generate large amounts of synthetic training data
from a small seed set of hand-labeled keyword-question pairs. Since natural
language questions are available in large quantities, we develop models to
automatically generate the corresponding keyword queries. Further, we introduce
various filtering mechanisms to ensure that synthetic training data is of high
quality. We demonstrate the feasibility of our approach using both automatic
and manual evaluation. This is an extended version of the article published
with the same title in the Proceedings of ICTIR'18.Comment: Extended version of ICTIR'18 full paper, 11 page
Comparative Analysis of Five XML Query Languages
XML is becoming the most relevant new standard for data representation and
exchange on the WWW. Novel languages for extracting and restructuring the XML
content have been proposed, some in the tradition of database query languages
(i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query
language has yet been decided, but the discussion is ongoing within the World
Wide Web Consortium and within many academic institutions and Internet-related
major companies. We present a comparison of five, representative query
languages for XML, highlighting their common features and differences.Comment: TeX v3.1415, 17 pages, 6 figures, to be published in ACM Sigmod
Record, March 200
Combining link and content-based information in a Bayesian inference model for entity search
An architectural model of a Bayesian inference network to support entity search in semantic knowledge bases is presented. The model supports the explicit combination of primitive data type and object-level semantics under a single computational framework. A flexible query model is supported capable to reason with the availability of simple semantics in querie
Learning Dynamic Classes of Events using Stacked Multilayer Perceptron Networks
People often use a web search engine to find information about events of
interest, for example, sport competitions, political elections, festivals and
entertainment news. In this paper, we study a problem of detecting
event-related queries, which is the first step before selecting a suitable
time-aware retrieval model. In general, event-related information needs can be
observed in query streams through various temporal patterns of user search
behavior, e.g., spiky peaks for popular events, and periodicities for
repetitive events. However, it is also common that users search for non-popular
events, which may not exhibit temporal variations in query streams, e.g., past
events recently occurred, historical events triggered by anniversaries or
similar events, and future events anticipated to happen. To address the
challenge of detecting dynamic classes of events, we propose a novel deep
learning model to classify a given query into a predetermined set of multiple
event types. Our proposed model, a Stacked Multilayer Perceptron (S-MLP)
network, consists of multilayer perceptron used as a basic learning unit. We
assemble stacked units to further learn complex relationships between neutrons
in successive layers. To evaluate our proposed model, we conduct experiments
using real-world queries and a set of manually created ground truth.
Preliminary results have shown that our proposed deep learning model
outperforms the state-of-the-art classification models significantly.Comment: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, 6 pages, 4
figure
Negative Statements Considered Useful
Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities
- …