Skip to main content


Extracting better product recommendations from crowd-sourced ratings and reviews

Jonathan Zhang

Jonathan Zhang

How did we ever make decisions before the Internet and its crowd-sourced recommendations?

These myriad online compilations of customer ratings and reviews inform just about every purchase—from movies and music to airplane seats and automobiles to doctors and dental floss.

But as Big Data gets bigger, distilling useful information has become increasingly difficult. When leveraged properly, though, data can yield a wealth of insights.

Take the new work of Jonathan Zhang, an assistant professor of marketing at the University of Washington Foster School of Business. Analyzing 20 years of fan input from an online movie forum, Zhang and his co-authors developed a novel recommendation system that synthesizes both quantitative data (ratings) and qualitative data (text reviews and comments) to provide more accurate and customizable advice for film enthusiasts.

The model, he adds, can easily be adapted to create smarter recommendations for any product or service that inspires consumers to rate, rant or rave on the web.

“Leveraging unstructured and user-generated content such as reviews can help companies provide more accurate recommendations to their customers, which in turn increases customer satisfaction and trust in the recommended products,” Zhang says. “It also yields insights into how consumers think about companies’ products to inform future product development.”

Ratings + reviews

For the study, Zhang and his co-authors—Asim Ansari of Columbia University and Yang Li of Cheung Kong Graduate School of Business—worked with data from the fan site MovieLens.org. They analyzed more than eight million point-scale ratings and 233,000 comments generated by 111,000 users considering more than 5,000 films over the period from 1995 to 2015.

Applying natural language processing techniques to categorize the comments, they fused the qualitative and quantitative data to reach a comprehensive profile of each movie and reviewer on the site. With these insights, they built a predictive model that generates more accurate recommendations for each user, based on each user’s past preferences, the tastes of other similar users and the characteristics of the movies.

Zhang adds that recommendations can be tailored for each user based on individual taste as well as key-word queries. So, for example, a search for “heart-warming and funny” movies would yield different recommendations for different consumers, based on their tastes and characteristics.

Better recommenders

Adapting this model to understand consumer tendencies and generate more useful advice for nearly any product can be a win-win for producers and customers alike, Zhang says.

“The reviews that you leave on an Amazon product page or in forums or on a company’s Twitter or Facebook page can be used by the company to make better products in the future and to provide you with the products that are relevant to your interests without you having to search for it,” he says. “As the world of machine learning and artificial intelligence gets increasingly sophisticated, the ‘data exhaust’ that consumers leave behind theoretically could be used to simplify your life, creating greater time efficiency and satisfaction than if you were to search on your own.”

This doesn’t necessarily add up to a techtopia for consumers. Zhang cautions that companies need to balance choice efficiency with data privacy. Also, a future of precise recommendations based on past tendencies can be limiting.

“We wonder if our past choices will determine future choices,” he adds. “This current technological trend is already making us increasingly siloed in our current preferences instead of venturing out and seeking new and serendipitous experiences.”

Probabilistic Topic Model for Hybrid Recommender Systems: A Stochastic Variational Bayesian Approach” is published in the November 2018 issue of Marketing Science.