\section{Introduction}
Personalized recommendation is ubiquitous on the internet. Recommender systems, e-commerce sites, and social media platforms use personalization to provide users with content and products that are tailored to their preferences \cite{xiao2007commerce}. Big search engines such as Google and Bing have also implemented personalization on search results \cite{hannak2013measuring}. Personalized Recommendation is a core part of content consumption and companies' revenue. For example, in 2012, NetFlix reported that 75\% of what users watched came from recommendation, and Amazon reported that 35\% of its sales came from recommendations \cite{pu2011user}. Beyond sales, personalized recommendations have been found to influence users more than recommendation from experts and peers do \cite{senecal2004influence}. Personalized recommendations are found to lower users' decision effort and increase users' decision quality \cite{senecal2004influence}.
The phonomenon of personalized recommendation in online information provision and consumption has aroused critisims in some sections. Personalized recommendation is feared to create filter bubbles and echo chambers. Filter bubble is a situation where users are fed with information consistent with their beliefs and views and insiluated from content that does not accord with their views. This, proponents argue, is a danager for the functioning of democracy where an individual's exposure to opposing points of view is imperative. Filter bubble is particularly feared in that it is believed to create and maintain ideological segregation, thus balkanizing society along interest lines.
% On the other hand, given the explosion of online content and the resulting information overload for cosumers, personalized recommendation is viewed as alleviating the burden of information ovrload and helping users make better decisions about what to consume and what not.
\subsection{Criticism}
Filter bubble is more likely to correlate With availability of more opinions and content; individuals may choose to consume content that accords with their beliefs \cite{flaxman2016filter}. In controlled environments, individuals choose content from outlets that are alighned wirth their held beliefs \cite{garrett2009echo, iyengar2009red and, munson2010presenting}. Some emperical Studies have shown that personalized recommendation can cause filter bubbles. For example a study conducted on the filter bubble effects on social media and search engines shows that both media create filter bubbles. Social media, especially, creates a greater filter bubble.
From absolutisit perspective, any recommender system can be considered as creating a filter bubble since by definition, a recommender system presents users with different sets of items. From this perspective, any variation between users in terms of the items served is a cause of a filter bubble. From this perspective, filter bubble is the degree of difference imposed on users because of personalized recommendation.
The more feared aspect of filter bubble is, however, the claim that it is being used as tool to insulate users from content that disagrees with their beliefs and views. Creating an ideological islands is considered a bad thing for society in general and for the functioning of democracy in particular. Democracy is premised on the presence of opposing ideas and the tolerance of individuals to opposing views and beliefs. By creating ideological islands, recommender systems can foster intolerance to opposing views, breed extremism and endanger democracy. Ideological balkanization of society along interest lines narrows tolerances and coexistence and widens extremism and intolerance. This allegation has serious ramifications in society.
Another study \cite{nguyen2014exploring} examined the effect of recommender system on the diversity of recommended and consumed items. The study was done on MovieLens dataset \footnote{http://grouplens.org/datasets/movielens/}. They separated recommendation-takers and recommendation-ignorers and examined the content diversity in those two groups. Two movies are compared using their attributes (tag genome data), using Euclidean distance. The distances between groups are measured as the average of the distances between the groups movies.
The finding is that indeed recommender systems create a condition where recommended items and items rated by users became narrower (less diverse) over time. However, when comparing recommendation takers and non-takers, the study found out that recommendation takers had more diverse consumed items than non-takers. The study was conducted on a recommendation system that uses item to item collaborative filtering, and as such its conclusions should be taken with that caveat.
The cases fir filter bubble are all based on varying degrees of the sense that recommendations to users should not vary. This, however, is a big problem when seen from another perspective, that is, the user' self interet. Users clearly have shon their individual prefrences and thus differences.
\subsection{Approval}
With the explosion in in online content, or the abudance of choice products, there is a real problem of information overlaod, which a recommender systems attempts to adress. It is, therefore, evident that recommender systems are solving user problems and helping users make choices.
After all, there are studies showing that even if users are presented with a mix of opposing and consistent views, they
end up reading content that is consistent with their views. Cognitive disonance explains this behaviour of users as the act of avoinding a cognitive cost that is associated with the consumption of information that is incosistent with one's view. If users choose content that is consistent with their views even if they are provided with opposing content, and if there is a real benefit in helping users make better decicions then, one can arguably ask, why not help users achieve that with personalized recommendation?
There are, hoever, factors that influence this self-enforced bubbles. For example topic involvement and presence of threat can influence users selective exposure to information. This, therefore offers the possibility to influence users attitudes and opinions. Additionally, there is a profit motive for companies to keep their users engaged and thus to employ personalized recommendation.
\subsection{Review on measures}
There are, so far, some attempts to measure personalization in an online information systems that apply personalized recommendation. One study has attempted to quantify personalization in Google Search \cite{hannak2013measuring}. The study recruited 200 Amazon Mechanical Turk users with Gmail accounts to participate in searching task in Google Search. The study also used newly created accounts, which are considered unsuitable for personalization as a result of having no history, to use them to generate search results that were used as baseline. The first page of the returned results were compared against the baseline results and also against each other. They used jaccard similarity and edit distance as metrics. The study reports that about $\mathit{11.7\%}$ of the results showed variation due to personalization. %The study also investigated the factors that caused variation in in personalization and it reports that loggingin and geography were were the factors.
The potential for personalization \cite{teevan2010potential} investigated how much improvement would be obtained if search engines were to personalize their search results, as opposed to providing search results that are the same for everyone, or a group. It used three datasets: explicit relevance judgments, behavioral relevance judgments (clicks on items) and content-based relevance judgments. The work showed that there was a huge potential for personalization in all the relevance judgments.
The work used DCG to measure the quality of a search result ranking. The best ranking for a user is one that ranks the items in the order the user ranked them in the case of explicit judgments, or clicked them in the case of the behavioral relevance judgment or in order of the size of their similarity score to previously consumed content in the case of content-based relevance judgment. So the ideal normalized DCG score for a perfect system is $\mathit{1}$. When we attempt the best ranking for two or more people (a group), the DCG score will be less than $\mathit{1}$ as the members will have different rankings. The difference between this group DCG score and the individual score is what they called the potential for personalization. The research reports that there was a potential to improve the ranking by 70\%. In terms of the relevance judgments, it reports that the content-based relevance judgments show the highest potential followed by explicit judgments and the click-based judgments respectively.
The paper concludes that the behavior-based relevance judgment can serve as a good proxy to explicit judgments in operational search engines.
The behavioral measure used in the potential for personalization is the closest to the pullpush measure we are proposing here. It, is however, different on two counts. One) it lacks the dynamism that the PushPul score embodies. The behavioral measure says you should have ranked it according to how the user consumed. But it does not factor in that had the results been presented in that order, the click behaviour would have been different. As such, it does not offer anything in ways of suggesting a direction of action.
There are evidences for both diversity and filter bubble in personalized recommender system. These diversity and filter bubble are, however, measured by how different the items served are. It is one thing to measure the difference between served items, it is completely another thing to satisfy users and to relieve them of the information overload. As much as we want to prevent the widening of the ideological differences, we also want to help with overcoming information overload. There is also evidence that even if users are given mixed content of opposing and agreeing content, they still choose the content that is consistent with their views. If that is the case, then why waste users time in giving them content they will not consume?
Is it possible to find the optimum personalization level where users are provided with the items they would like to read, but without locking them in a bubble? How can that be done in a systems that already employs a recommender system? We propose a user-centric litmus test to see if a recommender system is any where close to these balance. The premises for this measure are the following.
1) Filter Bubble is a real possiblity
2) User interest is an important factor in information provision
3) It is much more natural to delpoy a recommender systems and study its effects.
Once a personalized recommender systems is deployed, a reality is created. A reality where users or any segment of users are exposed to personalized sets of items. It is not possible to go back and rerun the personalized recommendation in the same situation of users, time and items. Given this reality, is there any insight one can get with respect to how the personalized recommendation is doing in relation to amplifying or dumping the filter bubble.
\subsection{Motivate New Measure (pullpush)}
The proliferation of recommender systems is a response to the ever-increasing amount of available information - they are the supposed solutions to information overload. Recommendation is the selection of useful information from a vast amount of available information for the user to consume. Recommendation can be implemented in many different ways. For example it can be implemented to recommend popular items, or most recent items. The main operationalization of recommendation is, however, personalization. For the recommended items to be relevant, the user preference must be modeled from user history and the recommended items be selected on the basis of the modeled preference, that is, they must be personalized and tailored to the interests of the user.
A number of approaches can be applied to online content provision. No (personalized) filtering at all leads to information overload, or necessitates a random selection of content. Full editorial curation leads to a limited set of content with a specific point of view. Individual personalization is more likely to lead to increased content engagement, user satisfaction, retention and revenue for wider audiences. For the user, it means less cognitive overload, increased efficiency and satisfaction.
However, on the extreme end, this could arguably lead to filter bubbles.
A filter bubble might not necessarily directly be a problem from a user engagement standpoint; content may still be consumed as it fits the individual user's interests. However, it might be a problem from the point of view of the user and society. User interests can evolve, expand over time, and over-personalization can become a problem for the user when he or she misses relevant information just because the algorithm decides it is not relevant for him/her.
The filter bubble is an interesting problem from the point of view of society as a whole. It could be argued that it is in the interest of common good that people are exposed to diverse views. Exposure to different views can, it is believed, increase tolerance, social integration and stability. This would mean it is in the interest of society for individuals to be exposed to different views and reduce the effect of a potential filter bubble.
%One can argue that the filter bubble is the cost that the user pays for the benefits of efficiency, reduced cognitive load, and better satisfaction.
We can debate about whether the concept of filter bubbles is right or wrong in theory, but recommendation is a fact of life today. The question is whether we can strike a balance between under- and overpersonalization. Whichever direction this balance should tip in, it would be in the interest of everybody to be able to quantify the level of personalization in a recommender system.
In this study we propose a novel method for quantifying personalization, by comparing the response of different users to personalized recommendations.
Once an perwsonalized recommender system is deployed, a new and unrepeatable reality is created. A relity where users react to the items they are provided with, and a reality where it is not possible to know what would have happened to the myriad other possibilities of providing recommendations would have entailed in terms of recommendations and clicks. To this extent, the reactions of users in terms of clicks are not absolute but relative to the recommendations they are provided with. Clicks in a recommender system are then completely situational in that, if the recommendations were other sets of items, the clicks would have also been others. Given this situational aspect of the recommendation-reaction situation, what can the recommendations and clicks tell us about personalized recommendation. Specifically, can we infer anything with regards to either amplifying or dumping the filter bubble effect?
Once this situational reality is created, how can one say something about the systems's tedency to amplify or dumpen filter bubble? We argue one can say something by comparing a segment of users against each other. We even go further and claim that the only true measure of a personalized recommendationss's ability to amplify or dumpen filter bubble can be measured in a comparative sense, for fundamentally, the personalized recommendation is premised on the assumption that there is a difference in user preferences. We carry this assumption into devising a metric for quantifying the tendency of a recommender system to dumpen or amplify filter bubbles.
%We call the method a PullPush to indicate whether users want to be kept apart or brought closer to each other from how the current level of personalization maintains.
% The proposed method sees personalization fundamentally as the separation of items according to users' information preferences. There are two perspectives to measure the personalization level in a recommender system. One is from the perspective of no-change, where overlap between the information items recommended to different users is desirable. The second perspective is about (automatically) optimizing for user engagement, capturing user preferences in such a way that the content is tailored towards engagements. In this study, we approach the quantification of personalization from the perspectivepara of assessing 'good' (or 'perfect') personalization from each of these perspectives.
The contributions of our work are: 1) we refine the conceptualization of personalization 2) we propose a method for quantifying personalization using the response of users to personalized recommendation, and 3) we show how the method can be applied and used to suggest improvement (from different perspectives) to a system that does personalized recommendation. The rest of the paper is organized as follows. In Section \ref{rel}, we review related literature and in Section \ref{mot} we present background and motivation for the work. In Section \ref{pro} we discuss our proposed method, followed by a discussion on datasets, results and analysis in Section \ref{result}. finallty, we finish with a discussion and conclusion in Section \ref{conc}.