1 research outputs found
Autodetection and Classification of Hidden Cultural City Districts from Yelp Reviews
Topic models are a way to discover underlying themes in an otherwise
unstructured collection of documents. In this study, we specifically used the
Latent Dirichlet Allocation (LDA) topic model on a dataset of Yelp reviews to
classify restaurants based off of their reviews. Furthermore, we hypothesize
that within a city, restaurants can be grouped into similar "clusters" based on
both location and similarity. We used several different clustering methods,
including K-means Clustering and a Probabilistic Mixture Model, in order to
uncover and classify districts, both well-known and hidden (i.e. cultural areas
like Chinatown or hearsay like "the best street for Italian restaurants")
within a city. We use these models to display and label different clusters on a
map. We also introduce a topic similarity heatmap that displays the similarity
distribution in a city to a new restaurant