Search CORE

62 research outputs found

New York City Dataset

Author: Gillian Hayes (750972)
Hoda Anton-Culver (3473)
John Schomberg (750967)
Oliver Haimson (750971)
Publication venue
Publication date
Field of study

Dataset of Yelp tags and keywords for New York City</p

FigShare

Validation Dataset

Author: Gillian Hayes (750972)
Hoda Anton-Culver (3473)
John Schomberg (750967)
Oliver Haimson (750971)
Publication venue
Publication date
Field of study

This dataset was used to validate the model created with training data from the pilot study using only chinese restaurants in San Francisco</p

FigShare

San Francisco Dataset without cuisine exclusion

Author: Gillian Hayes (750972)
Hoda Anton-Culver (3473)
John Schomberg (750967)
Oliver Haimson (750971)
Publication venue
Publication date
Field of study

Dataset of tags and keywords for restaurants in san Francisco on Yelp.com.</p

FigShare

Training data used to generate predictive model

Author: Gillian Hayes (750972)
Hoda Anton-Culver (3473)
John Schomberg (750967)
Oliver Haimson (750971)
Publication venue
Publication date
Field of study

This dataset was used to create the predictive model applied to all other datasets.</p

FigShare

Supplementing Public Health Inspection via Social Media

Author: Gillian R. Hayes (2587933)
Hoda Anton-Culver (3473)
John P. Schomberg (2587936)
Oliver L. Haimson (2587930)
Publication venue
Publication date: 01/01/2016
Field of study

<div>Foodborne illness is prevented by inspection and surveillance conducted by health departments across America. Appropriate restaurant behavior is enforced and monitored via public health inspections. However, surveillance coverage provided by state and local health departments is insufficient in preventing the rising number of foodborne illness outbreaks. To address this need for improved surveillance coverage we conducted a supplementary form of public health surveillance using social media data: Yelp.com restaurant reviews in the city of San Francisco. Yelp is a social media site where users post reviews and rate restaurants they have personally visited. Presence of keywords related to health code regulations and foodborne illness symptoms, number of restaurant reviews, number of Yelp stars, and restaurant price range were included in a model predicting a restaurant’s likelihood of health code violation measured by the assigned San Francisco public health code rating. For a list of major health code violations see (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0152117#pone.0152117.s002" target="_blank">S1 Table</a>). We built the predictive model using 71,360 Yelp reviews of restaurants in the San Francisco Bay Area. The predictive model was able to predict health code violations in 78% of the restaurants receiving serious citations in our pilot study of 440 restaurants. Training and validation data sets each pulled data from 220 restaurants in San Francisco. Keyword analysis of free text within Yelp not only improved detection of high-risk restaurants, but it also served to identify specific risk factors related to health code violation. To further validate our model we applied the model generated in our pilot study to Yelp data from 1,542 restaurants in San Francisco. The model achieved 91% sensitivity 74% specificity, area under the receiver operator curve of 98%, and positive predictive value of 29% (given a substandard health code rating prevalence of 10%). When our model was applied to restaurant reviews in New York City we achieved 74% sensitivity, 54% specificity, area under the receiver operator curve of 77%, and positive predictive value of 25% (given a prevalence of 12%). Model accuracy improved when reviews ranked highest by Yelp were utilized. Our results indicate that public health surveillance can be improved by using social media data to identify restaurants at high risk for health code violation. Additionally, using highly ranked Yelp reviews improves predictive power and limits the number of reviews needed to generate prediction. Use of this approach as an adjunct to current risk ranking of restaurants prior to inspection may enhance detection of those restaurants participating in high risk practices that may have gone previously undetected. This model represents a step forward in the integration of social media into meaningful public health interventions.</div

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

(Significant Predictors in New York City Model).

Author: Gillian R. Hayes (2587933)
Hoda Anton-Culver (3473)
John P. Schomberg (2587936)
Oliver L. Haimson (2587930)
Publication venue
Publication date
Field of study

Odds Ratio and 95% Confidence Interval of Odds Ratio are listed above for predictive keywords in the New York City model. Table is limited to significant predictors. Additional terms that were highly predictive but not identified as significant due to collinearity are not listed in this table.</p

FigShare

An analysis of correlation between tags and keywords was used to decide which terms would be included in the model.

Author: Gillian R. Hayes (2587933)
Hoda Anton-Culver (3473)
John P. Schomberg (2587936)
Oliver L. Haimson (2587930)
Publication venue
Publication date
Field of study

A correlation cutoff of .05 was used for inclusion in the model unless the authors strongly believed the keyword would be useful in the model despite low correlation (e.g., high quality, food poisoning, and employees, selected for relation to food quality, foodborne illness, and employee behavior). A liberal cut off point was used to include as many predictors as possible. Correlation is specific to pilot study training data which excluded all but Chinese restaurants.</p

FigShare

Plot of observed and predicted prevalence of low health code rating over a fifteen month period from the beginning of 2014 to the end of 2015 using sample from all restaurants in New York City.

Author: Gillian R. Hayes (2587933)
Hoda Anton-Culver (3473)
John P. Schomberg (2587936)
Oliver L. Haimson (2587930)
Publication venue
Publication date
Field of study

Blue lines reflect predicted counts and red lines reflect the observed counts of restaurants with health code rating >14 (substandard).</p

FigShare

(Significant Predictors in San Francisco Model).

Author: Gillian R. Hayes (2587933)
Hoda Anton-Culver (3473)
John P. Schomberg (2587936)
Oliver L. Haimson (2587930)
Publication venue
Publication date
Field of study

Odds Ratio and 95% Confidence Interval of Odds Ratio are listed above for predictive keywords and tag in the San Francisco model. Table is limited to significant predictors. Additional terms that were highly predictive but not identified as significant due to collinearity are not listed in this table.</p

FigShare

Receiver Operator Curve created using Yelp data compiled from 1,543 San Francisco restaurants including all cuisine types.

Author: Gillian R. Hayes (2587933)
Hoda Anton-Culver (3473)
John P. Schomberg (2587936)
Oliver L. Haimson (2587930)
Publication venue
Publication date
Field of study

Receiver Operator Curve created using Yelp data compiled from 1,543 San Francisco restaurants including all cuisine types.</p

FigShare