POIReviewQA: A Semantically Enriched POI Retrieval and Question
  Answering Dataset

He, Cheng; Janowicz, Krzysztof; Lao, Ni; Liu, Sumang; Mai, Gengchen

research

POIReviewQA: A Semantically Enriched POI Retrieval and Question Answering Dataset

Authors: Cheng He
Krzysztof Janowicz
Ni Lao
Sumang Liu
Gengchen Mai
Publication date: 5 October 2018
Publisher: 'Association for Computing Machinery (ACM)'
Doi

Abstract

Many services that perform information retrieval for Points of Interest (POI) utilize a Lucene-based setup with spatial filtering. While this type of system is easy to implement it does not make use of semantics but relies on direct word matches between a query and reviews leading to a loss in both precision and recall. To study the challenging task of semantically enriching POIs from unstructured data in order to support open-domain search and question answering (QA), we introduce a new dataset POIReviewQA. It consists of 20k questions (e.g."is this restaurant dog friendly?") for 1022 Yelp business types. For each question we sampled 10 reviews, and annotated each sentence in the reviews whether it answers the question and what the corresponding answer is. To test a system's ability to understand the text we adopt an information retrieval evaluation by ranking all the review sentences for a question based on the likelihood that they answer this question. We build a Lucene-based baseline model, which achieves 77.0% AUC and 48.8% MAP. A sentence embedding-based model achieves 79.2% AUC and 41.8% MAP, indicating that the dataset presents a challenging problem for future research by the GIR community. The result technology can help exploit the thematic content of web documents and social media for characterisation of locations

Similar works

Full text

Available Versions

Crossref

Last time updated on 10/08/2021