HCDA/new_yahoo Files · background.tex · Centrum Wiskunde & Informatica (CWI)

Files @ 7ebf72cf87bf
Branch filter:
Location: HCDA/new_yahoo/background.tex

7ebf72cf87bf 15.8 KiB text/x-tex Show Annotation Show as Raw Download as Raw
Gebrekirstos Gebremeskel
add all
\section{Background and Motivation} \label{mot}

There are two seemingly opposing perspectives regarding personalized recommendation. The first perspective is that personalized recommendation is beneficial, for users and publishers. This perspective has resulted in the advent and proliferation of personalized recommendation systems, both as an integral part of a bigger information systems and as standalone systems. Today in the vast cases of online information provision, recommendation is a fact of life. In some cases, people may even visit particular sites because they feel they are adapted to their interests. The concern of a publisher applying personalized recommendation is not that they are doing personalization per se, but that personalization increases user engagement.
 
 The second perspective, a backlash to personalized recommendation, is that  personalized recommendation is disadvantageous for society \cite{pariser2011filter}. The proponents of this view argue that personalized recommendation balkanizes \cite{van2005global} society along interest lines and that it is creating a filter bubble effect,  a   phenomenon where a  user is isolated from content that an algorithm decides as being not relevant to them. The proponents of this view  argue that   personalized recommendation should be condemned and stopped. 
 
 \begin{figure} [t]
\centering
\includegraphics[scale=0.4]{img/recommendation_flow.pdf}
\caption{The recommendation flowchart from available items to clicks.}
\label{fig:flow}
\end{figure}

 The two  perspectives call for two perspectives in  how  personalization could be measured. In both perspectives, what is being scrutinized, the target to be measured,  is the personalized items that are shown to the user. But in the perspective that views the separation of content shown caused by recommendation as less desirable, shown items are compared against the available items; the available items become the reference point. In the extreme case, the position of the  opponents of personalized recommendation  can be construed as saying that the information that is presented must be the same as the available relevant information. This extreme position is untenable and unfeasible because it also implies that presenting items in some ranked order is wrong. In practice, however, they are opposing the difference in the results that are presented for two people that have  supposedly the same information need, as for example when they query a search engine with the same query terms. This milder position has been 
measured by comparing  one user's recommended items against anothers'.
 
 In the perspective that sees recommendation as beneficial, the user's recommended items are compared against the perfect information interest and preferences  of the users. %  For example, for one opposing the filter bubble, the reference frame is the unchanging of the information from user to user. 
 This view strives to increase engagement, and overcome the information overload problem and as such, the reference frame against which the personalized items are compared is the (perfect) interest of each user, that is, how close the personalization is to the  users' actual information preference. 
 
 These two different perspectives on measuring personalized recommendations are similar to the perspectives on  recommender system evaluation: system-centric and user-centric. In a system-centric evaluation, there is a ground truth and the goal is to find out how good the recommender system would be able predict the ground truth. System-centric measures are neat, and replicable. However, since system-centric evaluations do not always correlate to user satisfaction \cite{mcnee2006being}, there are alternative,  user-centric evaluation of recommender systems.  Measures such as click-through-rate (CTR), dwell-time and engagement have been proposed and used as user-centric metrics. 
 
 \subsection{The Recommendation Pipeline}
 
  Figure \ref{fig:flow} shows the process of recommendation from beginning to end. The process in the figure could be for both query-based and query-less personalization. The part that is surrounded by the  yellow-dotted rectangle shows what personalization  targets when it is seen from the perspective of opposing personalized recommendation. The extreme version of that is that items shown to users must be exactly the same as the available relevant items. In practice, however, only the view box is considered and the measures compare the items shown to users against each other.  This means the measure's aim is to quantify the difference between recommendations. 
  %The items that are not shown play no role in the computation of personalization from the perspective of no-change.
  
  For the proponents of personalized recommendation, the part of the recommendation pipeline that they are interested in measuring personalization is the rectangle that is surrounded by the green-dotted line.  Its objective is to measure personalization by comparing  how  close the recommended items are to the actual user interests. From this perspective, a personalized recommender system's usefulness is measured against how well it serves the user's information interests and preferences. In this study, we measure personalization from this perspective, and as such  it is a user-centric point of view.  %This naturally calls for the recommended items to be compared against the  users' actual interest. % As such, it calls for a measure that compares the personalization against the user's actual interest.
  %Measuring the personalization recommendation against this perspective targets the the part of the flowchart that is bound by the red dotted-line.
  
%   Since it is not possible to find the perfect interests of the information consuming unit, it is usually approximated by the information that the information consuming unit has previously consumed.  This perspective and measure is more difficult and tricky than the no-change perspective and the measure that it calls for. The reasons are 1) it means we have many different interests to compare with 2) the clicks are influenced by what the system serves. The problem is that it is hard to know the perfect items that satisfy the user interest because the user interest is not a fixed one, but a shifting one in time and with many other factors.  This situation is complicated even further by the temporarity of the items in the sense that the items to be recommended are usually ephemeral making it hard to learn and improve. That calls for personalization at the level of meta features of the items rather than at the item level.

%  There have been attempts to quantify the level of personalization in recommendation systems from the perspective of no-change. One study examined the level of personalization in search engines by recruiting mechanical Turk users with mail accounts to search in google for a set of keywords. The study compares the results using Jacquard similarity, and edit distance. The study found that about 17\% of the queries were affected by personalization. Another study tried to quantify the change in diversity of recommended items over time with a view to measure the filter bubble. In both cases the perspective is system-centric. 
 
%  The opposite of unchanging information is the perfect need of the user. However, since it is impossible to find the perfect interest of the user, the closest we can come to this perfect information is what the user consumes, the clicks. We can measure the level of personalization against these reference frame as opposed to measuring it against the unchanging information reference frame. When we measure the level of personalization against this, we are measuring how good the system is in delivering the items that the user is interested in. These reference frame has one very serious limitation and that is that the clicks are influenced by what is served and this is a serious limitation. 

% Can these two measures be combined to obtain a holistic level of a recommender system's personalization?  No, they do not compliment and their combination does not make much sense. They have to measured for two different ends.

% Personalization affects   the views, the items that are served to different information consuming units.  There are two perspectives in measuring personalization. One perspective the sameness. It assumes that information provided should not vary. This views is deeply held by the progenitor and anti-proponents of filter bubble. The second perspective is how good a personalization satisfies the user interest. Here the reference frame is the information consuming  information need. The better that the personalization matches the information consuming units  information need, the better the personalization is. 
When measuring personalization from the user-centric point of view, we propose that personalization be viewed as having two components.  One component is the ranking of the selected items to be recommended. In other words, this component is about ranking the items in order of their relevance to the user. 
%This component  should also deal with the number times  of items that items  should be shown to a user. So for example, it should be able to for example recommend more of item on an certain event and less items on another event.  
For the ranking component to start ranking the user's items, the items of interest should be selected from the available items. This is the most important component and we call it  the separation component, and it refers to  the separation of content along user interest lines. A holistic measure of personalization should take into account both the ranking and separation components.  

% While this seems, to us, the fundamental definition of personalization, today it also includes other aspects. For example, not only should a personalization system be able to separate content to different user interests, but also it should be able to know the quantity of and rank of the items in a user interest. So for example, it should be able to for example more of item 1 than item 2 for a user. 2) it should give, one it serves, in the correct order, that is that item 1 should come before item 2.  

% How do we measure the personalization? We believe that it has to account for this two aspects. 
% The separation: Imagine we have the perfect interests of the different information consuming units. That means we can measure the differences. If we treat them as sets, we can for example use Jacquard index to measure their similarity (or dissimilarity). We can average the similarities(dissimilarities) to obtain an aggregate score of similarity (dissimilarity).    

 We argue the response to personalization  can  measured only in a comparative manner.  The reason are first, because there can not be a true ground truth in a dynamic system where the items and user interest change constantly and the situation of users is always subject to different outside factors. In such a system,  the best measure is a comparative one conducted over a dataset collected over a long time. We also argue that the response to personalization can only be measured in a reactive manner, that is that how the users react to it.  

%\subsection{The Ranking Component}

\subsection{The Separation Component}
As the ranking component of the recommender system is the classical information retrieval, here we focus on the separation component. 
Personalization is first and for most the separation of content according to users' interests. So fundamentally, personalization is based on the assumption that there is a difference between the information interests of users.

A good recommender system then must be able to deliver recommendations that maintain this difference between the different users' information preferences. This means the aggregate similarity (dissimilarity) of the different sets of  recommendations to users must be as close as possible to the aggregate similarity (dissimilarity) between their actual preferences. This measure of effectiveness of a recommender system can be called the degree of personalization  and we can define it, mathematically, as the difference between recommendation similarity (dissimilarity) and the actual similarity (dissimilarity) (see Section \ref{pro})

% 
% Personalization = Actual Aggregate - recommendation aggregate. 
% Assume we have n information consuming units. The aggregate similarity between them is the similarity between the all pairs (combinations).
% Summation (combination)similarity between pairs.
% 
% We apply the same formula for the recommendations.  Rearranging, we get the difference between the actual similarity and the recommendation similarity. So the overall formula basically boils down to the sum of the  differences between actual distance and recommendation distance. 

\subsection{Relations Between the Two Perspectives}
 
How are measures based on the two perspectives related to each other? Can measures designed to measure personalization from the assumed perspective of personalization causes overfiltering (bad) tell us anything about measures designed to measure personalization from the assumed perspective of personalization as beneficial (good)? We maintain that there is no direct relationship. A recommender system  might serve different items without meeting the users' interests, thus clearly  doing personalization from the perspective that views recommendation as bad. 
% A bad recommender system might do personalization without meeting the user's information preferences. It might even make it worse than no-recommendation. 

The user-perspective and the measures that it calls for can not also tell us about whether the system is doing personalization from the no-change perspective. For example, a recommender system might achieve a good separation of content according to users interest, but that does not say  much about how the recommended items are similar or different from the available relevant items,  or whether they are similar/different to each other. So, we maintain that the system-centric and user-centric measures are done for different ends. 

% these measures are measuring completely different perspectives and different interested, one to minimize information overload, another to keep diversity of available information. 


\subsection{Clicks as (imperfect proxy of) Actual User Interest}
Now the question is: how can we obtain actual user interest and preference? True user preference is hardly possible to come by in real recommendation system: 1) because user interest changes 2) items change 3) it is not possible that we can rerun the same scenario of recommendation again with the same items and users.  So the best approximation for the actual user interest is the users' history of items engaged with. For the sake of simplicity, we here use users' click history as an example (despite their limitations as a measure as compared to for example dwell time instead \cite{yi2014dwell}).
There is, however, one big problem with users' click history as actual user preferences and that is that clicks are dependent on what has been recommended. This means, if we were to provide different recommendations, the click history would turn out to be different too. However, we think the approximation of actual user interest with clicks is reasonable for the following reasons.  

We are interested in the relative similarity, not absolute similarity. So, yes, indeed, the similarity between the clicks of two users will be affected by what the system recommends. However, we assume the relative similarity (distance) between them remains more or less the same since the recommendation quality affects both of them, unless of course the system is somehow tweaked to be good to some and bad to others. Simply, we are saying that it is reasonable to assume that the distance between clicks is proportionally affected by the recommendation as the distance between the recommendations. This is to say that the difference between the click distance and the recommendation distance remains reasonably the same.