Yelp Helps NYC Health Department Track Foodborne Illnesses

Picture this: You’re ready for dinner, but you’re not sure what cuisine you want to eat.

Or maybe you’re in an unfamiliar neighborhood and want to explore. You pull out your phone, pull up Yelp or Open Table or Urban Spoon, and find a promising place to nosh.

And a few hours or days later, you feel sick, and want to punish the place you think is responsible by giving them a lambasting, single-star review. But you don’t want to be that person. So you don’t.

Next time, rethink that impulse. You could be helping a lot of people—and not just by warning them off a dodgy dish.

Today, in its weekly news bulletin, the Centers for Disease Control and Prevention reports on an innovative project that the New York City health department tried out for nine months in collaboration with Yelp. The team examined more than 294,000 reviews posted to the site between July 2012 and March 2013; found that 468 of them reported people getting sick after a meal; and discovered, within those 468, three foodborne illness outbreaks that authorities not known about. Only 3 percent of the illnesses listed on Yelp had ever been reported, by the sick people or their doctors, to the health department.

This is a big deal, several different ways. First, because this data—the reviews—are available for anyone to read, whereas foodborne illness information is notoriously difficult for public-health authorities to get hold of. (Think about it. The last time a presumably food-related illness took you down, did you call your doctor, or tough it out yourself?): The CDC itself acknowledges that its national figure of 48 million food-related illnesses each year is only an estimate, since as few as 2 percent of them get medical attention. And, second, because it offers a tiny step into a problem that public-health types are obsessed with: How to make practical use of the masses of data that swirls around us every day, and that we all contribute to, by texting, Tweeting, posting to blogs, updating Facebook, and so on.

To do the analysis, Yelp provided to the New York City department a reformatted file of the more than 294,000 reviews of New York restaurants (without changing the text—this was just for ease of analysis). The team working on the project did an automated review first, to sort out reviews containing words like “sick,” “vomit,” “food poisoning,” and then had a team member review each one that the program had flagged, to be sure it was describing an illness. Then they double-checked the timing of the posts, comparing them to descriptions of when the posters said they ate, to be sure the incubation period matched what a foodborne-disease organism would cause. (Less than two hours, and the bout of illness was probably caused by a toxin; more than 4 weeks, and the person probably mistook what caused them to be sick.)

I took a random dive into Yelp to experience what they would have been looking for. I quickly saw why they needed the combination of machine algorithm and human evaluation; they would have wanted this:

review-1

but not this:

review-2

or this:

review-3.

Once the researchers worked out those technicalities, the analysis proved really fruitful, not only in identifying never-disclosed illnesses, but also in fitting the Yelp posts into patterns. Thanks to that extra step, New York City was able to spot clusters of illness around three separate restaurants, and associate them with individual dishes: a house salad, shrimp and lobster cannelloni, and “macaroni and cheese spring rolls.” (Eww.) With that data, the city’s outbreak-detection team honed in on the restaurants and actually identified things they were doing wrong: cross-contamination with other foods, coolers that weren’t cold enough, unwashed vegetables, and roaches and mice. (Again: eww.) So the analysis of the Yelp reviews didn’t just reveal past illness; it probably prevented future ones as well.

This New York City project is one of several around the country that are attempting to leverage socially gathered data to keep foodborne illness under control. The CDC used receipts stored in supermarket shopper-loyalty card records to solve a nationwide Salmonella outbreak in 2009. Chicago’s health department built an app (web version is here) that collects descriptions of illness from residents and then replies to them via Twitter. And researchers at Harvard Medical School and Boston Children’s Hospital recently got access to Twitter’s firehose of data, to see whether they can extract early signs of outbreaks from what people are saying in real time. (The Harvard-Children’s team are also behind the global disease-mapping project HealthMap, and the fascinating Digital Disease Detection conference, which is a data-ideas firehose all on its own.)

So next time you use an app to find a place to eat, feel sick, and want to say something, you have permission to go ahead: You won’t just be warning other diners, you’ll be helping science as well. But do the scientists a favor, and be careful how you use those words they are looking for. For instance, try not to do this:

review-4.

(The full cite for the New York City departments paper is: Harrison C, Jorder M, Stern H et al. Using Online Reviews by Restaurant Patrons to Identify Unreported Cases of Foodborne IllnessNew York City, 2012-2013. It is being published in the CDCs Morbidity and Mortality Weekly Report.)