The eighth ACM conference on Recommender Systems just took place in Foster City, California, USA.
This yearly event represents the most important appointment for the recommender systems’ community worldwide, bringing together researchers and experts from both academia and industry. This year the event took place in Silicon Valley, which attracted an unprecedented number of people to a completely sold out conference, with many additional virtual attendees on YouTube, where the main conference presentations have been streamed live.
It is notable that about half of participants were from industry this year, with a massive presence of the big enterprise players, such as Google, Linkedin, Netflix, Amazon, Microsoft, Samsung, Pandora, Facebook, Baidu, and many others.
The conference was, as usual, very intense, with a full 5-day program, including 2-day workshops. This year we, as the ContentWise R&D team, together with other partners, were also involved in the activities of the European 7th Framework Program project CrowdRec, that actively contributed to the conference with an important role in organizing two workshops (RecSys challenge and CrowdRec) and presenting several papers and posters.
Now that I am just back from the event, let me share some considerations and takeaways.
Factorization machines (FMs) have been a hot topic of past conferences, with many practitioners experimenting with this elegant and relatively simple solution for collaborative and content-aware recommender systems. While FMs – and, more generally, techniques based on matrix and tensor factorization – are still the reference solutions for the most advanced recommendations, deep learning seems to be a promising approach. On this topic, the keynote by Jeff Dean of Google gave an interesting overview of how they applied large deep neural networks to solve several machine learning problems, and giving some possible applications to tackle personalization challenges.
Just a few words about the evergreen K-nearest neighbor collaborative filtering (aka KNN), revisited by some of the work presented. Among other approaches, I found particularly interesting the idea of using the inverted neighbor algorithm to improve recommendation diversity as proposed in the “Diversity, Novelty, and Serendipity” session.
A/B testing has been a trendy subject over the last 2 or 3 RecSys conference editions, with Netflix sharing its experience. This year a half-day workshop was dedicated to controlled experiments, with an outstanding panel of speakers from Google (Diane Tang), Microsoft (Brian Frasca), Netflix (Caitlin Smallwood), PayPal (Mike Lo) and SiteSpect (Justin Bougher). In addition, controlled experiments were discussed by Xavier Amatriain of Netflix in his tutorial “The Recommender Problem Revisited“, and it was the subject of Linkedin’s and Facebook’s presentations at the industry session.
A/B testing – which has been applied to web page optimization for a long time – requires, to be successful, a data-driven decision-making culture. Consequently, big companies tend to centralize the management of experiments around a complex process involving many domain experts. Having a clearly defined experiment goals and metrics and how they will be computed seems to be critical to a profitable implementation of controlled experiments. A flexible and user-friendly dashboard to monitor the experiments is another key factor, especially for companies – such as Google and Netflix – where there are tens or hundreds of concurrent experiments by several stakeholders such as marketing, algorithm and user interface teams with hundreds of metrics to analyze.
Finally, why not exploring alternative solutions to A/B testing? Multi-armed bandit algorithms are encouraging techniques that might help quickly converging to the optimal configuration, since the assignment of a user to one of the treatment groups is not completely random but it depends on the most recent performance of the related configurations. The main concern seems to be the more complicated logging requirements of multi-armed bandit than standard A/B testing. We’ll see how this family of algorithms evolves over time. In the meantime, a couple of papers leveraged the multi-armed bandit techniques in the context of recommendations.
As the amount of information in a system grows to the point that it becomes exponentially difficult for a user to find interesting or useful data, several techniques to surface relevant content have been developed. Information retrieval was the first solution, with search engines playing the main role. While search engines were driven by explicit user queries, newer techniques started filtering information on the basis of the user preferences as is the case in recommender systems. Ultimately, similar techniques have been applied in alternative domains, such as in the delivery of advertisement, where ads are targeted to user interests and context.
As extensively discussed by Hector Garcia-Molina or Stanford University in his keynote, we are progressing towards solutions where search, recommendation, and advertisement are part of a common ecosystem where almost everything is personalized and the different technologies share a common user interface, the same input data and core components.
The key trend here is that a larger and larger share of revenues come from recommendations. As an example, CareerBuilder shared during their presentation at the CrowdRec workshop that about 45% of their job applications come from recommendations. An extreme case is that of StichFix, presented by Eric Colson in the industry session about their business model, which is 100% based on recommendations.
This year a workshop on Television and Online video took place at RecSys for the first time, a full-day event jointly organized by Comcast, Boxfish, and Graphlab. The specific focus on the television domain made the workshop particularly interesting for me, with inspiring talks by Brendan Kitts (Adapt.tv), Justin Basilico (Netflix), and Junling Hu (Samsung).
I am proud to say that our work on “Time-based TV Programs Prediction“, a result of the collaboration between ContentWise R&D team (myself and Andrea Condorelli) and Politecnico Milan (Paolo Cremonesi and Roberto Pagano), received a particular mention for the best paper award.
Looking forward to the 2015 Recsys Challenge. We are waiting eagerly for the big e-commerce dataset (12 million user sessions!) which is the basis of next year’s challenge, this time organized by YouChoose. May the best work win (€5000 prize for the best solution).
…see you next year at RecSys 2015, in Vienna, Austria!
How Catalog Builder helped one of the most iconic department store overhaul their e-commerce catalog data quality and speed to reduce exit rate, cut bounce rate and increase basket conversions.