Focus on an article published in EJOR*, a four-star journal, by Koen De Bock, Associate Professor in Marketing at Audencia Business School: A framework for configuring collaborative filtering-based recommendations derived from purchase data
An online retail environment offers ample opportunities for mass personalization, enabled through tracking technology, data analysis and the ability to automate the creation of personalized content in real-time at near-zero cost. Moreover, as customers face an abundance of products and product-related information there is a growing need for tools to help them cope with information overload. In this context, online retail often deploys recommendation systems that create personalized product sets for each customer. This personalized selection supports cross-selling, facilitates the choice process. This, in turn, can lead to greater satisfaction and so to increased sales, revenue, and loyalty.
A popular methodological approach to the creation of recommender systems is collaborative filtering (CF). This algorithm leverages the principle of the ‘wisdom of the crowd’ and allows an analyst to identify products (in general referred to as items) for every customer that have been appreciated in some way by that customer’s nearest neighbors, i.e. a set of similar customers (referred to as users) based on a number of characteristics. This item appreciation can reveal itself in many ways, ranging from explicit feedback (e.g. customers’ movie ratings on Netflix) to implicit feedback (e.g. product additions to the shopping cart on Amazon.fr).
This study’s objective is to offer guidance to marketers in building better recommendation systems and avoiding trial and error processes in their attempts to find a suitable recommendation algorithm. Specifically, this study proposes a decision support framework to help e-commerce companies select the best collaborative filtering algorithms (CF) for generating recommendations on the basis of a specific, universally applicable form of implicit user feedback: online purchase data. To create this framework, an experimental design tests several CF configurations, which are characterized by three algorithm design features: different data reduction techniques, CF methods, and similarity measures, in function of three characteristics of the input data, i.e. sparsity level, purchase distribution, and item-user ratio. Sparsity level refers to the degree to which customers purchase most, or only a fraction, of the product range; the purchase distribution defines whether all products have an equal chance of being purchased or not, and the item-user ratio quantifies how the number of products compares to the number of customers. The framework not only identifies the most accurate model but also gives an indication of the diversity (the ability of the model to recommend diverse sets of products) and calculation times of different models. An experimental validation is based on synthetic datasets with different binary purchase input characteristics as well as two real-life validation sets.
The evaluations in terms of accuracy, diversity, computation time, and trade-offs among these metrics reveal that the accuracy and the diversity of the generated recommendations depend on the data reduction technique, the CF method deployed as well as the similarity measure used to identify similar customers. The computation time only depends on the data reduction technique. Secondly, the varying input data characteristics can lead to different optimal model configurations. The best-performing algorithm in terms of accuracy remains consistent regardless of the input data characteristics. Specifically, in order to optimize model accuracy, a company should rely upon correspondence analysis for data pre-processing, and build a CF model that is item based using either cosine or correlation-based similarity measures, regardless of what the input data looks like. On the other hand, when an analyst wishes to optimize for diversity and/or computation time, the best-performing model varies with the input data characteristics.
In summary, this paper develops a decision-support framework that allows e-commerce companies to decide on the optimal CF configuration as a function of the characteristics of their specific binary purchase datasets. They also gain insight into the impact of changes in the input dataset on the preferred algorithm configuration.
For more details, please find the full article in the European Journal of Operational Research: *Geuens, S., Coussement, K. and De Bock, K.W., 2018, A framework for configuring collaborative filtering-based recommendations derived from purchase data. European Journal of Operational Research, Vol. 265 (1), p.208-218.