Many services that perform information retrieval for Points of Interest (POI)
utilize a Lucene-based setup with spatial filtering. While this type of system
is easy to implement it does not make use of semantics but relies on direct
word matches between a query and reviews leading to a loss in both precision
and recall. To study the challenging task of semantically enriching POIs from
unstructured data in order to support open-domain search and question answering
(QA), we introduce a new dataset POIReviewQA. It consists of 20k questions
(e.g."is this restaurant dog friendly?") for 1022 Yelp business types. For each
question we sampled 10 reviews, and annotated each sentence in the reviews
whether it answers the question and what the corresponding answer is. To test a
system's ability to understand the text we adopt an information retrieval
evaluation by ranking all the review sentences for a question based on the
likelihood that they answer this question. We build a Lucene-based baseline
model, which achieves 77.0% AUC and 48.8% MAP. A sentence embedding-based model
achieves 79.2% AUC and 41.8% MAP, indicating that the dataset presents a
challenging problem for future research by the GIR community. The result
technology can help exploit the thematic content of web documents and social
media for characterisation of locations