HCDA/new_yahoo Files · method.tex · Centrum Wiskunde & Informatica (CWI)

Files @ 7ebf72cf87bf
Branch filter:
Location: HCDA/new_yahoo/method.tex

7ebf72cf87bf 25.3 KiB text/x-tex Show Annotation Show as Raw Download as Raw
Gebrekirstos Gebremeskel
add all





\section{The Proposed Method} \label{pro}
\subsection{To be used method}

If we provide the users disjoint sets of items, the viewDistance is so big (normal-
ize, it would be 1), and the the ClickDistance is 0.  Clearly, the ClickDistance is
a function of the ViewDistance insofar as there are no shared views, there can
not be shared clicks.  The opposite of this is that if the views are identical, any
difference in clicks must be a result of the differences in user preference, and not
of the discrimanation of the system in serving different views.  How do we take
into account the grip of the view vector on the click vector?
Maybe one way is to only focus on the shared views. In this view, we compare
the percentage of shared views (out of all the items showed to both) against the
percentage of shared clicks (out of all the clikcs by the users).  The difference
between  these  two  proportions  is  then  the  measure  of  the  PullPush.   0  score
would denote a perfect level of personalization between two users.  A negative
score would indicate that there is a need towards less personalization, towards
providing them more similar items.  A positive score would suggest a need for
more personalization in that the users are choosing to diverge themselves.
What is the difference between using proportions and distances,  as in our
previous approach?  I think there is no fundamental difference.  While distance
measures  difference,  proportions  are  measuring  similarity,  I  think.   But  while
distance can not take the disjointness into account, proportion does.
Another question is how does this method compare to CTR? CTR can not
tell us whether we are over or under-personalizing.  It just tells us onlty the rate
of sucess.  And rate of sucess can not indicate which way we should go, and at
which users we should take actions.  So the benefit of this proportion is that it
tells us which direction we should go - over or under personalize.  And this, we
believe, is a finer metric, and a more practical and suggestive of improvements.
The  next  step  is  to  obtain  two  datasets,  apply  the  method  and  see  what
they are doing.  This metod is easy to motivate as the desire to quatify whether
a system is amplyfying or dampening the bubble effect.  This must call for a
comparison of the method to the one that measures the effect of recommendation
on the diversity of recommendations and consumption.

%When a system recommends items to  users, we  can construct two vectors: the view vector and the click vector.  

% To overcome with limitations of CTR as a measure of the level of personalization, we propose a new metric that addresses  the limitation. we propose to use a distance metric. One distance metric is the distance between the View and click vectors for a certain geographic region. We define the distance between the View and Click vectors as revolt. We think of the views as what the system thinks is relevant to the geographic unit in question, and the click is what the geographic region actually clicks. The distance between these two is revolt, the amount buy which the geographical unit tries to reject the view.
% 
% 
% The dimensions of the two vectors can be either items served, meta data about the items, or entities found in the items. The vectors can also be built at different levels of granularity such as the user level, a demographic group, or a geographic unit. when personalization is done at the user-level, it is fair to assume that,  at this granularity, it is not possible to capture higher level similarity of interest  on the basis of demographic and/or geographic reasons. It is, however, intuitive to think that people in one city may have much more common interest between them than people of one city with people of another city. 
% 
% 
% 
% 
% The revolt score is a measure of dissatisfaction, it could be due to over -or under-customization and anything in between. This makes it a useless measure as far as our concern is to measure the level of personalization and sub subsequently suggest ways to improve the system. Specifically, should we personalize more or less?   This means we have to invent another measure that can differentiate overspecialization from internationalization. 




%\subsection{Method}



\begin{figure} [t]
\centering
\includegraphics[scale=0.4]{img/view_click.pdf}
<<<<<<< HEAD
\caption{The conceptualization of  personalization using the views and clicks of two users. Arrows show that views influence clicks. The difference between $\mathit{ViewDistance}$ and $\mathit{clickDistance}$ is resprsented by $\mathit{d1}$ + $\mathit{d2}$.}
\label{fig:flow}
=======
\caption{The conceptualization of  personalization using the views and clicks of two users. Arrows show that views influence clicks. The difference between $\mathit{ViewDistance}$ and $\mathit{ClickDistance}$ is represented by $\mathit{d1}$ + $\mathit{d2}$.}
\label{fig:viewclick}
>>>>>>> a63ceda4cbe41aba7675cbe81e4429553f97d001
\end{figure}


We propose a method to measure the degree of personalization in a recommender system  as the ability to maintain the same distance between the personalized recommendations as there is between the user's engagement (e.g click, or dwell) histories - here represented by click history. The method is by nature comparative and reactive. It is comparative because personalization is fundamentally about maintaining a difference between users, and as such the measure needs to compare the personalized recommendation of the users that are assumed to have differences in preferences. Thus the measure is comparative 1) in that it needs to compare the items  both among personalized recommendations and among clicks histories 2) in that it compares the aggregate similarity in personalized recommendations  against the aggregate similarities in click histories. %In other words the nature of personalization calls for a comparative measure. 

The measure is reactive because  the  user's interest represented by clicks are a reaction to the recommendations. In other words, there is an inherent dependence of what the user consumes on what is showed to the user.  The relative aspect of measuring personalization means that there is no fixed reference frame against which we can compare. Therefore, we can measure clicks against recommendation, but that is only reactive, that is, in response to the recommendations. Because the moment the recommendations change, the clicks would have been different. So the assumption we are making is 1) clicks are depedent on recommendations in the sense that a change in recommended items would alter clicked ittems 2) Despite click depdence on recommendation,  The similarities/differences between the clicks remain more or less proportional to the similarities/differences between the personalized recommendation.


In a recommender system, there are the items that the system recommends to the user, and there are the items that the user chooses to consume.   If an item is not shown, it can not be clicked. This does not, however, mean that the user has no choice whatsoever. The user can choose in two ways: 1) in absolute sense, that is clicking on some and not on others and 2) in quantitative sense, that is consuming content about, for example, some entity more often than about another entity. Thus the user has some  freedom to choose, but the freedom is constrained by what is recommended. 

The more different the click vector is from the view vector, the more the system is  failing to capture the information consumption needs of the user.  However, the relationship between a view vector and a click vector for a user does not specifically show whether a system is doing over- or under-personalization. It could not for example, distinguish personalized recommendation from non-personalized recommendation. For this reason, we conceive personalization as the ability of a personalized recommender system to separate information items according to user preferences. 

Figure \ref{fig:flow} shows the relationship between Views and Clicks for two users, $\mathit{user1}$ and $\mathit{user2}$. User1 is served with $\mathit{View1}$ and has consumed $\mathit{Click1}$. Similarly, user2 is served with View2 and has consumed $\mathit{Click2}$. The arrows from Views to Clicks show the direction of influence. If there is any difference between the views, that difference   is  the result of personalization. Given this views, the users will click on some and not on others, and thus will have different click vectors from the respective view vectors. The difference between the click vectors is what the actual difference in the consumption of the two users is given the personalization.  The gap between the view difference and click difference is a measure of the tendency  of  the users in response to personalization. We call this gap the PullPush score.  %This difference is $mathi{d_{1}}$ and $\mathit{d_{2}}$.


There is a  fundamental dependence of what the user consumes on what is served.
To measure  the level of personalization, we first compute similarity between the views  themselves, and the  between clicks themselves, and then we compare the aggregate recommendation similarity with the aggregate click similarity. From now on, the we refer to the recommendations as views and the consumed items as clicks.  %the user's recommended itesm (views) first and click vectors of two geographical units.  

We define $\mathit{ViewDistance}$ as the distance between the view vectors (Equation \ref{eq:view}) and  $\mathit{ClickDistance}$ as the distance between the click vectors (Equation \ref{eq:click}).
The way we view the relationship between views and clicks is that given the views, how much does the click vector differ from the view vector. Another way of saying that is what is the users' tendency from the current system; do the users tend to 'want' more personalization or less personalization. 


% This question can be answered only in a comparative and relativist way. The comparative is by comparing one user against another. Relative because it can only be measured given what is shown. It is not possible to have an absolute measure of this. 
% 
% There is a very fundamental dependence of what the user consumes on what is served.
% To measure  the level of personalization, we first compute similarity between the views with in themselves, and the clicks between themselves, and then we compare the aggregate recommendation similarity with the aggregate click similarity. From now on, the we refer to the recommendation as views and the consumed items as clicks.  %the user's recommended itesm (views) first and click vectors of two geographical units.  


% 
% To give you some intuition, imagine you make a guess of the positions in a two dimensional space of two points A and B as (a,b) and (c,d) respectively. From the positions, we can compute how far apart you think they are using, for example, Euclidean distance and call this GuessDistance. Then, the true positions are revealed to be  (e,f) and (g,h) respectively, and let's call the true distance between them  TrueDistance.  The ViewDistance is your GuessDistance and your ClickDistance is the TrueDistance. The difference in this scores is what I called PullPush. If it is zero, the GuessDistance and the TrueDistance are the same and if they are negative, the true distance was larger than the guess distance.


%A way to think about the PullPush score is as follows.  The reference frame is the distance between the two click vectors, the distance between the actual vectors. The distance between the view vectors is compared agans this reference frame. 
\begin{equation}
 ViewDistance= d(View_{city1}, View_{city2})
\end{equation}\label{eq:view}


\begin{equation}
 ClickDistance= d(Click_{city1}, Click_{city2})
\end{equation} \label{eq:click}



The $\mathit{ViewDistance}$ is the distance that the system maintains between the two users. If two users are served exactly the same content, the system thinks they are exactly the same in terms of their need for content and hence serves them the same content. The more different content the system serves to the two users, the more different the system thinks they are  in terms of their information consumption needs.  The difference or similarity in the content served is achieved by personalization which is usually implemented at the level of the user.    

However, from the served content, a user has the freedom to choose what to read. For example, a certain content platform may serve content about Obama $\mathit{1000}$ times and about Donald about $\mathit{100}$ times. If the content about Obama has been clicked only 100 times, but the content on Donald has been clicked 100 times, the geographical unit is showing its preference for more content on Donald despite the system thinking more content on Obama is of interest to that geographical unit.  There is also the option that the user might not click on an item altogether. 
The $\mathit{ClickDistance}$ is the measure of how different the users are in terms of the content they choose to consume given the constraints of what they are served by the system. The smaller the distance, the more similar the users are in terms of what they choose to consume.  
\begin{equation}
 PullPush = ViewDistance-ClickDistance
\end{equation} \label{pullpush}



Using both $\mathit{ViewDistance}$  and $\mathit{ClickDistance}$,   We define a metric which we call pullPush (Equation \ref{pullpush}) as the difference between the View distance and the click  distance. To obtain the aggregate PullPush score for a recommender system, we computer the average PullPush score between each pair of users as in Equation \ref{pullpushaggr}.

\begin{equation}
 PullPush_{aggr} = \sum_{i \in pairs} PullPush_{i} \\
\end{equation} \label{pullpushaggr}



\subsection{Selection of a Distance Metric}

The advantage of the method that we have proposed is that it is indepedebt of any specific distance metric. One can use different distance metrics depdenig on their objectives and tastes. 
 In our case, we used Jensen-Shannon Divergence (JSD)  a distance metric based on KL-divergence. JSD is defined  in Equation \ref{eq:jsd} and KL is defined in Equation \ref{eq:kl}. Before applying the JSD on the views and clicks, we  first turn the view and click vectors into conditional probabilities of $\mathit{P(View|State}$ and $\mathit{P(Click|State}$. 


 
 \begin{equation}\label{eq:jsd}
JSD(X,Y) = \sqrt{\frac{1}{2} KL(X, \frac{(X+Y)}{2}) + \frac{1}{2} KL(Y, \frac{(X+Y)}{2})}
\end{equation}

\begin{equation}\label{eq:kl}
KL(X,Y)=\sum\limits_{i}  x_{i}\ln{\frac{x_{i}}{y_{i}}}
\end{equation}


\subsection{Selection of Users}

% Items are ephemeral. Entities are relatively perennial. Traditional philosophy claims that an object that has no permanence is not a reliable object for study.  Also given the fact that over customization or under-customization can only be studied in a comparative sense over a longer time, it makes sense that we focus on objects that have longer life expectancy. Items do not provide us that, but entities can. It is for these reasons that we use entities here as our features rather than the items. 

We have chosen to measure personalization at the geographical level.  This has two advantages. The first one is that we do not suffer from data sparsity. The second one is that we can use this to show that personalization can be measured at any level and that this can be used as an opportunity to show that there is a potential for personalization at another level than the oft-used user-level. As personalization is usually done at the user level, measuring personalization at geographical level can show the commonality (and thereforer the potential for personalization) that can exist outside the user-level. 




\subsection{Interpreting the Method}

%The biggest challange in a recommendation is that we can know if a recommendation is interesting or not only after the fact - after it has been recommened. Only then can we know  on the basis of the user's reaction to the recommendation.  
The PullPush score is   a measure of the tendency of two geographical units to drift away or to get closer to each other from how the  current personalization treats them. By definition, this measure is relative in the sense that we can measure it against the current personalization. It is not possible to measure it in absolute sense.   If the PullPush is $\mathit{0}$, then the right amount of distance is maintained by the personalization used  in the system. It means that the personalized recommendations are capturing the user preferences, as a result of which the the difference between the $\mathit{ViewDistance}$ and the $\mathit{ClickDistance}$ is $mathit{0}$. This score is what we good personalization should strive for. 

If PullPush score  is negative, then the  system is overloading the users with content that they are not interested in, and that is reflected in the fact that the distance between the click vectors is greater than the distance between the click vectors.  It shows that the users want to be treated in a way that captures their interest than the current personalization does. In other words, the systems treats the users in a more similar way than the user want.  The users  want to drift away from  how the system's personalization treats them. The larger the negative score, the more the  need for personalization. 


If the PullPush is positive, then it means the users want to get closer to each other than the systems' personalization treats them. It means, the system is performing over-personalization. A positive score is an indication that filter bubble might be a serious issue. A bigger positive score between two users is an indication that there is a big potential for more personalization between them, for more separation of the content according to their preferences. 

Recommendation is, at bottom, a dynamic system. As such, the choice of click history as the representation of the user interest is problematic. To make matters worse, the clicks are affected by the recommendations. This raises the question: to what extent the users can deviate from the recommended items. Or in other words, is it possible that the click vector could be similar enough to cause that the PullPush measure to be positive? In absolute terms, it can not. For example when one uses Euclidean distance, the components of the click vector will always have a value less than the corresponding value in the view vector. To avoid that problem we normalize the vectors by the sum of all views, and the sum of all clicks respectively. 

The next question is whether normalization eliminates the tendency for the PullPush measure to be negative. We argue that normalization indeed can eliminate the tendency for the PullPush score to be negative. In Table \ref{synth}, we see a table with synthetic view and click vectors. Using those vectors, we obtained a positive score for the PullPush, and this is an indication that it is possible for the PullPush score  to be positive in practice.  
 
\begin{table}
\caption{A table, with synthetic data, that shows that there is a possibility for the PullPush score to be positive.}
  \begin{tabular}{|l|l|l|l|l|l|l|}
    \hline
    \multirow{1}{*}{Entities} &
      \multicolumn{2}{c|}{City1} &
      \multicolumn{2}{c|}{City2}\\ 
    & View & Click & View   &Click \\
    \hline
    Entity1 & 20 & 5 &40&112 \\
    \hline
    Entity2 & 15 & 7 &30&20 \\
    \hline
     Entity3 & 10 & 8 &20&18 \\
    \hline
     Entity4 & 5 & 1 &10&10 \\
    \hline
    
    
  \end{tabular}
  \label{synth}
\end{table}


%Given the dependence of the clicks on the recommendations, the only thing we can measure is how much the clicks deviate from what the recommendations are. This means that the we can only measure the clicks relative to the recommendations. 



%  If one uses them as, then one is purely studying the separation story. However, if one treats them as probabilities, one is incorporating the number of times an item must be recommended to a user. 

In the potential for personalization, they investigated the potential for personalization as the difference between  optimizing for an individual and optimizing for a group. In doing so, their assumption was that either the user interests are known from explicit relevance judgments, or the clicks were seen as the true representation of interest. In a dynamic operational personalization system, the use of explicit relevance judgments is hard to come by and it does not address the problem of how our system is doing in the current state of personalization. 

The use of clicks as  the true representation of user interest is good, but the potential for personalization does not address the coupling of recommendation and clicks, that is the bias that is inherent in the clicks as a result of the recommendation. Its assumption was that there was no personalization involved. In our case, the presence of recommendations changes the dynamics completely. 

The PullPush score can be used as a measure of the potential for personalization. When the score is positive, we can consider that that is the size of the potential for personalization, and when the score is negative it can be considred the potential for depersonalization. In the potential for personalization, they used DCG as a metric.  The reason for using distance metric as a measure of potential for personalization as opposed to using CDG as in the Potential for customization are: 1) First and formost, we view personalization as the ability to maintain correct difference between personalized recommendations to users. 2) we do not have ranking information in our logs 3) We do not have explicit ground truth, and  4) Ground truth in a dynamic system is problematic to say the least.


 

\subsection{Comparisons to Other Measures}
One might wonder how the proposed measure is different from other known measures such as CTR.    A CTR is a measure that overlaps with some of the aspects of ranking and the proposed measure. It overlaps for example with ranking measures in the sense that a higher CTR is indicative of a better ranking. It also can relate to the separation component in the sense that a higher CTR might indicate a good recommendation which indirectly depends on good separation between the different information preferences. A CTR is, by design, not suited to to be a measure for the separation component. It is not suited to take recommendations and clicks of two users. CTR can work only with clicks and views for one user. One can also compare the CTRs of two users, but in that case what one is measuring is the pattern of the ratios, not the personalization. 

There are, however,  many aspects that CTR does not capture even when applied to one user. For example, a system might achieve a good CTR at the expense of  just recommending very few items. It can also not capture the difference between items recommended a large number of times and consumed a large number of times from items recommended few times and consumed few times. CTR, as just a ratio, is not well-suited for capturing that difference. 




% A good way to demonstrate that is to look at  the dataset in Table \ref{input} which shows the view and click vectors for two geographical units (cities, in our specific case) and the entities they consume. A view is a  recommendation, the item that a user from a certain geographical unit is exposed to when they visit a website.  A click is the view that was actually clicked by a user. In the table, the views and clicks are aggregated by entities over all users from the city in question. Note that the CTR  in the table is the same for both cities, illustrating that the  CTR metric can hide the difference in the entities that these cities are served and actually read (click). 
% The success of the recommender system can be measured by how similar the two vectors are to each other. A perfect recommender system is one  where the view and the click vectors are identical. We can quantify the relationship  between the  view and the click vectors in many different ways. One measure is to use Click-Through RATE (CTR).  However, a CTR measure can misleadingly indicate that the click vector is the same as the view vector even if they are different. 
%   
% 
% \begin{table}
%   \begin{tabular}{|l|l|l|l|l|l|}
%     \hline
%     \multirow{2}{*}{Entities} &
%       \multicolumn{2}{c|}{City1} &
%       \multicolumn{2}{c}{City2} &
%       \multirow{2}{*}{CTR} \\
%     & View & Click & View  & Click  \\
%     \hline
%     Entity1 & 10000 & 1000 &10&1 &0.1\\
%     \hline
%     Entity2 & 5000 & 500 &100&10&0.1 \\
%     \hline
%      Entity3 & 1000 & 100 &500&50&0.1 \\
%     \hline
%      Entity4 & 500 & 50 &1000&100&0.1 \\
%     \hline
%      Entity5 & 100 & 10 &5000&500&0.1 \\
%     \hline
%      Entity6 & 10 & 1 &100&10 &0.1\\
%     \hline
%     
%     
%   \end{tabular}
%   \label{input}
% \end{table}
% 




% 
% This can happen, for example, when the ratio is the same, but the actual values are different such as 10/100 and 100/1000. In personalization, this difference matters because this shows that if I am interested in Item and and another is is interested in Item, even if the ration is the same, it does not mean we are interested in the same items. CTR  is agnostic on the difference at the level of the dimensions of the vector. For example, it would not differentiate between a geographic region which have been served 100 items about Obama  and consumed 
% 10, 200 times on Donald and consumed 10. If we exchanged the quantities on Obama and Donald, the CTR score would remain the same. Another problem with CTR is that is that it does not take into account what is not served (withheld).
% 
%