HCDA/sigir2016repo Changeset - 4d3caee8827e · Centrum Wiskunde & Informatica (CWI)

Changeset - 4d3caee8827e

Parent rev.

Child rev.

[Not reviewed]

0 2 0

Gebrekirstos Gebremeskel - 9 years ago 2016-04-20 21:43:21
destinycome@gmail.com

update

2 files changed with 130 insertions and 90 deletions:

main.tex

ref.bib

0 comments (0 inline, 0 general)

main.tex

➞

Show inline comments

@@ @@ -150,68 +150,61 @@ @@
 \maketitle
 \begin{abstract}
 In a setting where  recommendations are provided to users when they are viewing particular items (base items), what are the  factors that contribute to clicks on recommendations?  We examine whether a click on a recommendation is a function of the base item, the recommended item, or of both.  More specifically, we examine the items from which clicks happen and what type of items get clicked. Are some base items more likely to cause the user to click  on recommendations, and are some  recommendations more likely to be clicked?   We attempt to explain the factors that trigger clicks on recommendations at the levels of categories of items, transitions between the categories.
 In a setting where  recommendations are provided to a user when the user is viewing a particular item, what are the  factors that contribute to clicks on the recommendations?  We examine whether a click on a recommendation is a function of the item the user is reading at the time of the recommendation, or of the recommended item itself.  More specifically, we investigate whether  some items  are more  likely to cause the user to click  on recommendations, and  some  recommendations are more likely to be clicked.  Our investigation of the factors is done at  the level of the categories of items and the transitions between the categories.  We find that the categories of items play a big role in clicks on recommendations, and that they hold  potential for a category-level personalization.
 \end{abstract}
 \section{Introduction}
  In a  study that investigated the relationship between the number of times  items are viewed and the the number of times clicks happened from those items  in several online publishers  \cite{said2013month}, it was reported that traditional news portals providing news and opinions on politics and current events are more likely to generate clicks on recommendation than special interest portals such as sports,  gardening, and auto mechanic forums.  Another study \cite{esiyok2014users}, using a similar dataset, investigated the impressions and clicks at level of the category  of items of one of the traditional news portals - Tagesspiegel (a popular German national news portal).  The finding was that there is a relationship between what the user is currently reading and what they  read next. They  reported that the categories local and sports enjoyed the most loyal readers, that is, that a user reading on local items will more likely keep reading items of the same category.  % recommendations that were made to the different websites raising a queation as to whether the clicks on recommendations were because of nature of the online publishers or the recommendation items.this study, we focus on one traditional news portal, tagespiegel and examine it to find out factors that trigger recommendations on clicks or lack thereof. %wether some categories are more likely to recieve clicks on recommendations. We also even go further and look at what type of items are more likely to trigger more clicks than others.
  While both  studies are very related and relevant to our interest in factors that contribute to clicks, they  did not investigate the relationship between the base items, the recommended items and the resulting clicks or lack thereof. In a recommendation setting where recommendation items are provided to  users on the items that the user is currently  viewing (henceforth referred to as base items),   what are the factors that trigger users to click on recommendations?  Are the clicks a function of the base items or of the recommended items? Do some base items and some recommended items cause users to click on recommendations more than others, and if they do what explains this difference?
  For recommender systems to deliver good recommendations, they need to incorporate different factors into the determination of recommendation items. This means understanding several factors that influence the recommender system's success. These factors can be broadly categorized into non-content and content factors. Non-content factors, include, among others, the user's current context, social media annotations and recommendation by friends.  Social media annotations, recommendation by branded companies and friend annotations and recommendations  \cite{kulkarni2013all} increase both user's  consumption and satisfaction.   Content and non-content factors are equally important  in influencing the user's decision to read news items \cite{jancsary2010towards}. There is also a study that shows that  geographic relevance affects the consumption and recommendation of news items \cite{gebremeskel2015role}.
   Content factors are those factors of the items that are modeled by key-words and named entities \cite{gabrilovich2004newsjunkie},  and topics and categories \cite{li2011scene}.
 One study that investigated responses to recommendations in different online publishers found that  traditional news portals providing news and opinions on politics and current events are more likely to generate clicks on recommendations than special interest portals such as sports,  gardening, and auto mechanic forums \cite{said2013month}.  Another study \cite{esiyok2014users} investigated the reading transition of users at the level of the category  of items of one of the traditional news portals - Tagesspiegel (a popular German  news portal) and reported that there is a relationship between what the user is currently reading and what they  read next. Specially interesting finding of this study  was that the categories local and sports received the most loyal readers, that is, that a user reading on one of these categories  will more likely keep reading items of the same category.
 In this study we examine these factors using the categories of the base items and the recommended items.
 %that might trigger clicks on recommendations  from several angles. One angle  is from  the categories of the base items the user is currently reading. More specifically,
 Are some categories   more likely to  cause the user to click on recommendations?
 %Similarly, we examine the categories of the recommended items and investigate whether some are more likely to trigger clicks on themselves upon recommendation.
 We also investigate how  the categories of the base items and the categories of the recommendation items are related in the way they trigger clicks. %Are some categories  more likely to trigger clicks on some categories? For example, is political category more likely to trigger clicks on political categories, or another category such as local category?
 %We also go down to the item level and look at the relationships of the base items and the recommendation items with respect to how likely they are to trigger clicks.  More specifically,
 We examine whether those base items that are more likely to trigger clicks on recommendations are the same as  the recommended items that are more likely to receive  clicks.
  While both  studies \cite{said2013month,esiyok2014users} are very related and relevant to our interest in factors that contribute to clicks, they  did not investigate the transitions between the base items and the recommended items. In a recommendation setting where recommendation items are provided to  a user on the item that the user is currently  viewing (henceforth referred to as base item),   what are the factors that trigger users to click on the recommendations?  Are the clicks a function of the base items or of the recommended items? Do some base items and some recommended items cause users to click on recommendations more than others, and if they do,  what explains this difference?  In this study we investigate these factors using the categories of the base items and the recommended items, and the transitions between them.
 The study contributes to the understanding of factors that influence recommendation systems. The insights % help 1) to understand the influence of the categories of the base-itmes and recommended-itmes in the clicks on recommendations, and  2) to  point recommender systems target those items that generate clicks and to ignore those that do not.
 The study contributes to the understanding of factors that influence recommendation systems. The insights from investigating from different angles help 1) to understand what aspects of the base  item causes the  user to click on a recommendation,  2) to understand what aspects of the recommended items make the user click on those recommendations, and  3) to target those items that generate clicks and to ignore those that do not.
 \begin{table*}
 % are more likely to trigger clicks, and what recommendation items are more likely to be clicked. To accomplish this task, we focus on items of categories that trigger more clicks. We identify items that triggered more clicks and items that caused less clicks. We also examine the relationship of the items that triggered clicks and the recommendations that are clicked and not clicked.
+%
+%
+%
 % We also examine  the relationship of the contenet of the base items and the items that are clicked to glean any relationship. For this, we employ contenet similarity measures between the base item and those items that are clicked from the base item. %A third angle is to look at the relationship between the  content of the base item and the items that are clicked from it. %This is interesting because sometimes it is not clear whether there i a direct relationship between the content similarity and behavioral factors.
 % we can also look at  whether users actually clicked on those items which have some geographical relevance in the sense that the items are about the geographical region that they come from too.
+%
+%
+%
 \caption{A sample of the dataset. \label{tab:sample}}
 \centering
   \begin{tabular}{|l|l|l|l|l|l|l|}
 \hline
            Base Item  &  Base Item Category & Recommendation& Recommendation Category &View&Click & CTR \\
            \hline
 229397219 &   Berlin &   229495114   &  Berlin &  17   &  1  & 5.88\\
 230306628 &    politics & 230291175&     wissen &   14 &    1 & 7.14\\
 40485126 &      Berlin &  225589114  &   politics  & 2    & 0  & 0.00\\
    \hline
   \end{tabular}
 \end{table*}
 \section{Dataset}
 We used a dataset off user-item interactions on Tagesspiegel, a real online German news portal. The  dataset was collected from  from 15-04-2015 to 04-07-2015. Items in Tagesspiegel are manually placed by the journalists  under  categories. For our study, we investigated $\mathit{9}$ categories: \textbf{politics (politik)}, \textbf{business (wirtschaft}, \textbf{sports (sport)}, \textbf{culture (kultur}, \textbf{world (weltspiegel)}, \textbf{opinion (meinung)}, \textbf{media (medien)}, \textbf{education (wissen)} and the local category \textbf{berlin}.
 The dataset is aggregated from the logs of the  recommender systems that we used during our participation in the CLEF NewsREEL 2015 challenge \cite{kille2015overview}. This challenge offered   participants the opportunity to plug their recommendation algorithms to  Plista\footnote{http://orp.plista.com/documentation} and provide recommendations to  real users visiting online publishers. Plista is a recommendation  framework that connects recommendation providers such as ourselves and recommendation service requester such as online news portals. Participation in the challenge enabled us to collect information of user-item interaction such as impressions (a user viewing an item), updates (appearance of new item, or change of  content of existing item) and clicks (a user clicking on recommendation item).
 %We used a dataset of user-item interactions on Tagesspiegel, an online German news portal.
 We  aggregated user-item interaction data from the logs of the  recommender systems that we used during our participation in the CLEF NewsREEL 2015 challenge \cite{kille2015overview}. This challenge offered   participants the opportunity to plug their recommendation algorithms to  Plista\footnote{http://orp.plista.com/documentation} and provide recommendations to  real users visiting online publishers. Plista is a recommendation  framework that connects recommendation providers such as ourselves and recommendation service requesters such as online news portals. Participation in the challenge enabled us to collect information of user-item interaction such as impressions (viewing of items by users) and clicks (a user clicking on recommended items).
 The three  recommendation algorithms that we used are two instances of \textbf{Recency}, and one instance of  \textbf{RecencyRandom}. The Recency algorithm keeps the most recently viewed or updated items and recommends the top  $\mathit{k}$  most recent items every time a recommendation request is made. The RecencyRandom recommender keeps the most recent $\mathit{100}$ items at any time and recommends, randomly, the requested number of items every time a  recommendation request is made.
- Unfortunately, click information provided by the Plista platform does not include whether the click on recommendation received  is in response to our recommendations or on someone other participants' recommendations. Since we know the user and the base item for which we recommended and the recommended items, we considered a click notification on one of our recommended items as a click on our recommendation,  if that click happened with in $\mathit{5}$ minutes  from the time of our recommendation.  From the combined collected dataset,  we extracted the base item, the category of the base item, the recommended item,  the category of the recommended item, the number of times a recommendation item has been recommended to a base item (view) and the number of times that the recommended item has been clicked from the base item. From  the views and clicks, we compute   click-through-rate (CTR) as the percentage of  views that are clicked.   A sample of the dataset is presented in Table \ref{tab:sample}.
+For this analysis, we focused on user-item interactions on Tagesspiegel, one of the biggest German online news portals.  The  interaction dataset was collected   from 15 April,2015 to 04 July,2015. Items in Tagesspiegel are manually placed by the journalists  under  categories. For our study, we investigated $\mathit{9}$ categories: \textbf{politics (politik)}, \textbf{business (wirtschaft)}, \textbf{sports (sport)}, \textbf{culture (kultur)}, \textbf{world (weltspiegel)}, \textbf{opinion (meinung)}, \textbf{media (medien)}, \textbf{education (wissen)} and the local category \textbf{berlin}.
 \begin{table*}
 \caption{A sample of the dataset. \label{tab:sample}}
 \centering
   \begin{tabular}{|l|l|l|l|l|l|l|}
 \hline
            Base Item  &  Base Item Category & Commendation& Recommendation Category &View&Click & CTR \\
            \hline
 229397219 &229495114   &   Berlin &     Berlin &  17   &  1  & 5.88\\
 230306628 &230291175&     politics &     wissen &   14 &    1 & 7.14\\
 40485126 & 225589114  &      Berlin &    politics  & 2    & 0  & 0.00\\
    \hline
   \end{tabular}
 \end{table*}
  Click information provided by the Plista platform does not directly show whether the click information received  is in response to our recommendations or  to some other participants' recommendations. Since we know the user and the base item for which we recommended and the recommended items, we considered a click notification on one of our recommended items as a click on our recommendation,  if that click happened with in $\mathit{5}$ minutes  from the time of our recommendation.  From the combined collected dataset,  we extracted the base item, the category of the base item, the recommended item,  the category of the recommended item, the number of times a recommendation item has been recommended to a base item (view) and the number of times that the recommended item has been clicked from the base item. From  the views and clicks, we compute   click-through-rate (CTR) as the percentage of  views that are clicked.   A sample of the extracted dataset is presented in Table \ref{tab:sample}.
 %Plista is a company that provides a recommendation platform where recommendation providers are linked with online publishers in need of recommendation sertvice.
 \section{Results and Analysis}
-Our dataset consists of a total of $\mathit{288979}$ base-item \\recommendation-item pairs. To see the relationship between views and clicks, we first  sorted the dataset according to views and then  normalized the \textbf{view} and \textbf{click} counts by the total number of views and the total number of clicks, respectively. We then selected the top $\mathit{1000}$ pairs and plotted the views  and the clicks. The reason for normalization is to be able to plot them together for easy comparison. %The selection of only $\mathit{1000}$ pairs is because the more items we use, the more difficult is to see .
+Our dataset consists of a total of $\mathit{288979}$ base-item \\recommendation-item pairs. To see the relationship between \textbf{views} and \textbf{clicks}, we first  sorted the dataset according to \textbf{views} and then  normalized the \textbf{view} and \textbf{click} counts by the total number of views and the total number of clicks, respectively. We then selected the top $\mathit{1000}$ pairs and plotted the views  and the clicks. The reason for normalization is to be able to plot them together for easy comparison. %The selection of only $\mathit{1000}$ pairs is because the more items we use, the more difficult is to see .
-Figure \ref{fig:view_click} shows the plot of views and clicks for the $\mathit{1000}$ pairs. The blue plot  is for views and is smooth since the data was sorted by views. The red plot is for the corresponding clicks on recommendations. We observe that the clicks do not follow the views, an indication  that t clicks do not correspond with the number  of times that a recommendation items is recommended to a base item. This observation  is the primary reason we set out to investigate, the relation between base items and recommended items and their attractiveness to the user, to begin with.  The ragged click plot shows that some items are more likely to trigger clicks on recommendations than others.  What can possibly explain this observation?  What causes these difference in CTR scores for the various items?
+Figure \ref{fig:view_click} shows the plot of views and clicks for the $\mathit{1000}$ pairs. The blue plot  is for views and is smooth since the data was sorted by views. The red plot is for the corresponding clicks on recommendations. We observe that the clicks do not follow the views, an indication  that the clicks do not correspond with the number  of times that a recommendation item is recommended to a base item. This observation  is the primary reason we set out to investigate, that is, the discrepancy between the views and clicks,  to begin with.  The ragged click plot shows that some items are more likely to trigger clicks on recommendations than others.  What can possibly explain this observation?  What causes these difference in CTR scores for the various items?
 \begin{figure} [t]
 \centering
-\includegraphics[scale=0.5]{img/tage_view_click1000-crop.pdf}
+\includegraphics[width=7cm, height=5cm]{img/tage_view_click1000-crop.pdf}
-\caption{Plots of views (blue) and clicks (red). The Plots are generated by first sorting by  views.  The difference in between the view and click plots suggests that some items are more likely to trigger clicks on recommendation than others. \label{fig:view_click}}
 \caption{Plots of views (blue) and clicks (red). The Plots are generated by first sorting by  views.  The difference  between the view and click plots suggests that some items are more likely to trigger clicks on recommendation than others. \label{fig:view_click}}
 \end{figure}
+%
+%
-\subsection{Categories of Base and Recommendation Items}
+\subsection{Item Categories}
-To start to explain the difference between the view plot and the click plot observed in \ref{fig:view_click},  we aggregated  views and clicks by the $\mathit{9}$ categories of   items  that the items are placed under in the Tagesspiegel website. The aggregation gives us two results: view and click counts of the  categories as base and as recommendation.   With the categories, we attempt to answer two questions: 1)  is there a relationship between the category of the base item and the likelihood of triggering  a click on recommendation?, and  2) is there a relationship between the category of the recommended item and the likelihood of triggering a click upon its recommendation?    Tables \ref{tab:base} and \ref{tab:reco} present  the views, clicks and CTR scores. The results are sorted by CTR scores.
+To start to explain the difference between the view plot and the click plot observed in Figure \ref{fig:view_click},  we aggregated  views and clicks by the $\mathit{9}$ categories of   items  that the items are placed under in the Tagesspiegel website. The aggregation gives us two results: view counts and click counts of the  base-item categories  and the recommended-item categories.   With the categories, we attempt to answer two questions: 1)  are there differences between the  base categories in  triggering   clicks on recommendations?, and  2) are there differences between the  recommendation categories in  triggering   clicks upon their recommendation?    Tables \ref{tab:base} and \ref{tab:reco} present  the views, clicks and CTR scores. The results are sorted by CTR scores.
-We observe  a difference between the base categories and the recommendation categories with respect to the likelihood of triggering clicks.  In the base categories,   items of  \textbf{politics} are more likely to trigger clicks  than  other categories,  followed by \textbf{opinion} and \textbf{world}. Special categories such as \textbf{culture} and and \textbf{education} are the least likely to trigger clicks on recommendations. This is consistent with the previous findings that reported special interest portals generate  less clicks on recommendations than traditional portals of providing news, opinions and current events.
+We observe  a difference between the base categories and the recommendation categories with respect to the likelihood of triggering clicks.  In the base categories,   The category  \textbf{Politics} is more likely to trigger clicks  than  other categories,  followed by \textbf{Opinion} and \textbf{World}. Special categories such as \textbf{Culture} and and \textbf{Education} are the least likely to trigger clicks on recommendations. This is consistent with the previous findings that reported that special interest portals generate  less clicks on recommendations than traditional portals  providing news, opinions and current events.
 \begin{table*}
-\caption{The views, clicks, and CTR of the categories . Table \ref{tab:base} is for the categories in base and \ref{tab:reco} is for the categories in recommendation. The CTR scores are generally higher in recommendation, and the ranking of the categories in terms of the CTR scores are different in base and in recommendation. }
+\caption{The views, clicks, and CTR scores of the categories . Table \ref{tab:base} is for the base categories and Table \ref{tab:reco} is for the recommendation categories. The CTR scores are generally higher in recommendation, and the ranking of the categories in terms of the CTR scores are different in base and in recommendation. }
 \parbox{.45\linewidth}{
 \centering
 \begin{tabular}{|l|l|l|l|l|}
 \hline
-           category  &  Views & Clicks & CTR (\%)\\
+           Category  &  Views & Clicks & CTR (\%)\\
            \hline
 politics&73197&178&0.24\\
 media&22426&50&0.22\\
 weltspiegel&37413&77&0.21\\
 wirtschaft&30045&61&0.2\\
 sport&29812&58&0.19\\
 berlin&123595&129&0.1\\
 meinung&4611&3&0.07\\
 kultur&21840&11&0.05\\
 wissen&13500&4&0.03\\
 Politik (Politics)&73197&178&0.24\\
 Medien (Media)&22426&50&0.22\\
 Weltspiegel (World)&37413&77&0.21\\
 Wirtschaft (Business)&30045&61&0.2\\
 Sport (sports)&29812&58&0.19\\
 Berlin&123595&129&0.1\\
 Meinung (Opinion)&4611&3&0.07\\
 Kultur (Culture)&21840&11&0.05\\
 Wissen (Education)&13500&4&0.03\\
@@ @@ -296,15 +289,15 @@ wissen&13500&4&0.03\\ @@
            category  &  Views & Clicks & CTR (\%)\\
            \hline
 medien&22147&68&0.31\\
 politik&68230&170&0.25\\\
 berlin&123559&188&0.15\\
 weltspiegel&37535&58&0.15\\
 sport&28160&36&0.13\\
 meinung&4925&5&0.1\\
 kultur&23278&21&0.09\\
 wissen&15650&10&0.06\\
 wirtschaft&32955&15&0.05\\
 Medien (Media)&22147&68&0.31\\
 Politik (Politics)&68230&170&0.25\\\
 Berlin&123559&188&0.15\\
 Weltspiegel (World)&37535&58&0.15\\
 Sport (Sports)&28160&36&0.13\\
 Meinung (Opinion)&4925&5&0.1\\
 Kultur (Culture)&23278&21&0.09\\
 Wissen (Education)&15650&10&0.06\\
 Wirtschaft (Business)&32955&15&0.05\\
    \hline
@@ @@ -315,10 +308,10 @@ wirtschaft&32955&15&0.05\\ @@
 \end{table*}
-On the recommendation side, however, it is  \textbf{media} that is the more likely to incur  clicks upon recommendation, followed by  \textbf{politics} and the local category (\textbf{berlin}. The two least performing categories are \textbf{business} and \textbf{education},  similar to the least performing  categories in base. So, overall, it seems that the likelihood of triggering clicks by the categories shows a difference when they are in base or in recommendation.  In general, the  categories have higher CTR scores in recommendation than  in   base.
+On the recommendation side, however, it is the category \textbf{Media} that is more likely to incur  clicks upon recommendation, followed by  \textbf{Politics} and the local category (\textbf{Berlin)}. The two least performing categories are \textbf{Business} and \textbf{Education},  similar to the least performing  categories in base. So, overall, it seems that the likelihood of triggering clicks by the categories shows a difference when they are in base or in recommendation.  In general, the  categories have higher CTR scores in recommendation than  in   base.
 To gain further insight, we looked at the CTRs of transitions from base category to recommendation category. The aim of this is to find out whether some base categories are more likely to trigger clicks on some recommendation categories. The results are presented in Table \ref{heatmap}.
 Some interesting observations can be seen in the category-to-category transitions. While the highest transition CTRs for the base categories of  \textbf{berlin} and \textbf{politics} are to \textbf{media}, for \textbf{business}, it is to \textbf{opinion}, for \textbf{sport} it is to \textbf{sport}. The highest transition CTR for \textbf{Culture} is to the local category,  \textbf{berlin}, and for \textbf{world} it is to \textbf{politics} followed by to \textbf{berlin}.  \textbf{Media}  is the one that is more likely to trigger clicks  upon recommendation. The local category \textbf{berlin} is the one that is more likely to trigger clicks on diverse recommendation categories.
 To gain further insight, we looked at the CTRs of the transitions from base categories to recommendation categories. The aim of this is to find out whether some base categories are more likely to trigger clicks on some recommendation categories. The results are presented in Table \ref{heatmap}.
 Some interesting observations can be seen in the category-to-category transitions. The highest transition CTRs for the base categories of  \textbf{Berlin} and \textbf{Politics} are to \textbf{Media}, for \textbf{Business}, it is to \textbf{Opinion}, for \textbf{Sports} it is to \textbf{sports}. The highest transition CTR for \textbf{Culture} is to the local category (\textbf{berlin}), and for \textbf{World} it is to \textbf{Politics} followed by to \textbf{Berlin}.  \textbf{Media}  is the most likely to trigger clicks  upon recommendation. The local category \textbf{Berlin} is the one that is more likely to trigger clicks on diverse recommendation categories.
 \begin{table*}
 \centering
 \hline
-&Berlin&politik&wirtschaft&sport&kultur&weltspiegel&meinung&medien&wissen\\
+&Berlin&Politik&Wirtschaft&Sport&Kultur&Weltspiegel&Meinung&Medien&Wissen\\
 \hline
 berlin&0.14&0.08&0.06&0.05&0.06&0.12&0.12&0.16&0.06\\
 politik&0.2&0.39&0.06&0.12&0.04&0.3&0&0.73&0.1\\
 wirtschaft&0.15&0.4&0.07&0.13&0.36&0.13&0.46&0.21&0\\
 sport&0.14&0.27&0&0.68&0.05&0.18&0&0.27&0.07\\
 kultur&0.11&0&0&0.06&0.07&0&0&0.07&0\\
 weltspiegel&0.24&0.27&0.06&0.13&0.17&0.13&0&0.4&0.18\\
 meinung&0.06&0&0&0&0&0&1.85&0.32&0\\
 medien&0.1&0.85&0&0.06&0&0.08&0&0.16&0\\
 wissen&0.02&0&0&0&0.11&0.15&0&0&0\\
 Berlin&0.14&0.08&0.06&0.05&0.06&0.12&0.12&0.16&0.06\\
 Politik&0.2&\textbf{0.39}&0.06&0.12&0.04&0.3&0&\textbf{0.73}&0.1\\
 Wirtschaft&0.15&\textbf{0.4}&0.07&0.13&0.36&0.13&\textbf{0.46}&0.21&0\\
 Sport&0.14&0.27&0&\textbf{0.68}&0.05&0.18&0&0.27&0.07\\
 Kultur&0.11&0&0&0.06&0.07&0&0&0.07&0\\
 Weltspiegel&0.24&0.27&0.06&0.13&0.17&0.13&0&\textbf{0.4}&0.18\\
 Meinung&0.06&0&0&0&0&0&\textbf{0.85}&0.32&0\\
 Medien&0.1&\textbf{0.85}&0&0.06&0&0.08&0&0.16&0\\
 Wissen&0.02&0&0&0&0.11&0.15&0&0&0\\
@@ @@ -373,14 +366,15 @@ wissen&0.02&0&0&0&0.11&0.15&0&0&0\\ @@
 % Question for myself: Is it maybe possible to compute the category CTR's? Like a hitmap of the CTRs where the recommendations are subsidvided to their categories and a CTR is computed? I think so. We can also go durther and look at the contenet similarities. Further, we can look at what type of items trigger more clicks by selecting some items which generated more clicks and analyzing them.
 At the item level, we  investigated whether  %re is a relationship, in triggering clicks on recommendations, between the base items and the recommended items. More specifically, are
-the base items that are more likely to trigger recommendation are also the ones that are more likely to be clicked  upon recommendations. To accomplish this, we first computed the CTRs separately  for base items and recommendation items and then intersected these to find the items that are in both. It is important to state here that we have more items in our recommendations than in our base items. This is  because we are only requested to provide recommendations to some items via the Plista platform, while we could choose from  all items  for  recommendation. Our dataset collected over two month comprises   $\mathit{55708}$  recommended items and $\mathit{18967}$  base items. The intersection resulted in $\mathit{15221}$ items for which we computed CTRs  in base and in recommendation.
+the base items that are more likely to trigger recommendation are also the ones that are more likely to be clicked  upon recommendations. To accomplish this, we first computed the CTRs   for base items and recommendation items separately creating two datasets of item-CTR score. We then intersected the two datasets by the items to find the items that exist both as base-items and recommended-items. It is important to state here that we have more recommended-items than  base items. This is  because we are only requested to provide recommendations to some items via the Plista platform, while we could choose from  all items  for  recommendation. Our dataset collected over two month comprises   $\mathit{55708}$  recommended-items and $\mathit{18967}$  base items. The intersection resulted in $\mathit{15221}$ items.
 To better visualize the results, we present  two plots. In Figure \ref{fig:view_click_base}, we present  plots generated by sorting the results by base CTR. The blue plot is for base CTR and red plot is for recommendation CTR.  What we observe here is that  the base items that are more likely to trigger clicks on recommendations are mostly also  the items that are more likely to trigger clicks upon their recommendations.There are, however many  items that are more likely to trigger clicks upon their recommendation, but they do not do so as base items. To visualize this better, we also sorted the results by recommendation CTR, and we obtained the plots in Figure \ref{fig:view_click_reco}. We observe here the base items (the blue line) that are more likely to trigger clicks on recommendation are a subset of  the recommendation items that are more likely to trigger clicks upon their recommendation. So from the overlap in the plots, we can conclude that for most of the items their ability to trigger clicks on recommendation as base items is indicative of their attractiveness as recommendation items. It seems, however, not the case that the ability to incur clicks upon recommendation is indicative of the ability to trigger clicks as a base item.    %The discrepancy we observe might have to do with the fact that we had a limited access to  base items while we have a full access to the items for recommendation.
 To better visualize the results, we present  two plots. In Figure \ref{fig:view_click_base}, we present  plots generated by sorting the results by base CTR. The blue plot is for base CTR and the red plot is for recommendation CTR.  What we observe here is that  the base items that are more likely to trigger clicks on recommendations are mostly also  the items that are more likely to trigger clicks upon their recommendations (bottom left of the plot).There are, however many  items that are more likely to trigger clicks upon their recommendation, but they do not do so as base items (bottom right of the plot). To see this from the other angle, we also sorted the results by recommendation CTR, plotted them again, and we obtained the plots in Figure \ref{fig:view_click_reco}. We observe here that the base items (the blue line) that are more likely to trigger clicks on recommendation are a subset of  the recommendation items that are more likely to trigger clicks upon their recommendation. So from the overlap in the plots, we can conclude that for most of the items their ability to trigger clicks on
 recommendation as base items is indicative of their attractiveness as recommendation items. It seems, however, not the case that the ability to incur clicks upon recommendation is indicative of the ability to trigger clicks as a base item.    %The discrepancy we observe might have to do with the fact that we had a limited access to  base items while we have a full access to the items for recommendation.
  \begin{figure} [t]
 \centering
-\includegraphics[scale=0.45]{img/base_reco_ctr_sorted_by_base-crop.pdf}
+\includegraphics[width=7cm, height=5cm]{img/base_reco_ctr_sorted_by_base-crop.pdf}
 \caption{CTRs of base items (blue) and of recommended items (red) generated by first sorting by base CTR. The high-scoring recommendation items do not follow the high-scoring base items. \label{fig:view_click_base}}
  \begin{figure} [t]
 \centering
-\includegraphics[scale=0.45]{img/base_reco_ctr_sorted_by_reco-crop.pdf}
+\includegraphics[width=7cm, height=5cm]{img/base_reco_ctr_sorted_by_reco-crop.pdf}
 \caption{CTRs of  base items (blue) and of recommended items (red) generated by first sorting by recommendation CTR. The high-scoring base items are mostly a subset of the high-scoring recommendation items.  \label{fig:view_click_reco}}
 \section{Discussion and Conclusion}
 In this study, we attempted to explain the factors that trigger  clicks on recommendations. We specifically investigated whether  clicks on recommendations are a function of the base items, or a function of the recommended items. We attempted to explain that by looking at the categories of items and the transitions between the categories. We found that indeed the category of the items explains some of the discrepancy between the likelihoods of triggering clicks both as base items and recommendation items in the sense that some base categories and some recommendation categories are more likely to trigger clicks than others.
 There is, however,  a difference between the categories in their likelihood to trigger clicks as  base category and as recommendation category. As base category, the politics category is the most likely to trigger clicks on recommendations followed by media. In recommendations, however, it is the media followed by politics that trigger clicks upon their recommendation. The results suggest that  click on recommendation is a function of both the base items and the recommended items. This is indicated by the fact that some categories are less or more  likely to generate clicks on recommendation whether as base or as recommendation. This suggests that  leveraging category information holds a potential for improving the performance of a recommender system.  The results also show that the performance of the categories as base and recommendations are not exactly aligned. This non-alignment was also observed at the item-level in that there were many items that were more likely to trigger clicks as recommendation, but not as base.
 The investigation of the transitions between categories suggests that recommendation can be improved by recommending some categories to those categories where they are more likely to get clicked. For example, we observe that it is more likely to receive clicks if we recommend media items to the  politics category and to the local category (berlin). Similarly, we observe that recommending sports items to sports items is much more likely to trigger clicks. %These results suggest that there is a way to improve recommender system by leveraging category information of items.
 In this study, we attempted to explain the factors that trigger  clicks on recommendations. We specifically investigated whether  clicks on recommendations are a function of the base-items, or  of the recommended-items. We attempted to explain that by looking at the categories of items and the transitions between them. We found that indeed the category of the items explains some of the discrepancy between the likelihoods of triggering clicks both as base items and recommendation items in the sense that some base categories and some recommendation categories are more likely to trigger clicks than others.
 % There is, however,  a difference between the categories in their likelihood to trigger clicks as  base category and as recommendation category. As base category, the politics category is the most likely to trigger clicks on recommendations followed by media. In recommendations, however, it is the media followed by politics that trigger clicks upon their recommendation.
 The results suggest that  click on recommendation is a function of both the base-items and the recommended-items. This is indicated by the fact that some categories are less or more  likely to generate clicks on recommendation whether as base or as recommendation. This suggests that  leveraging category information holds a potential for improving the performance of a recommender system.  The results also show that the performance of the categories as base and recommendation are not exactly aligned. This non-alignment was also observed at the item-level in that there were many items that were more likely to trigger clicks as recommendation, but not as base.
 We hope that this work can contribute to the understanding of factors affecting recommender systems. We have shown that category-level information can take us a long way in explaining a clicks on a recommendations. Item level information also showed that there is a relationship between base items that are more likely to trigger clicks and those recommendation items that are more likely to trigger clicks upon their recommendation. This all suggests that leveraging information at both the category and item levels might hold a potential for improving recommender systems. As a future work, we would like to investigate the factors that lead to clicks on recommendation using a larger  dataset and at the content level of the  items.
 The investigation of the transitions between categories suggests that recommendation can be improved by recommending some categories to those categories where they are more likely to get clicked. For example, we observe that it is more likely to receive clicks if we recommend media items to the  politics category and to the local category (berlin). Similarly, we observe that recommending sports items to sports items is much more likely to trigger clicks than recommending other categories. %These results suggest that there is a way to improve recommender system by leveraging category information of items.
  We have shown that category-level information can take us a long way in explaining a clicks on a recommendations.
 % Item level information also showed that there is a relationship between base items that are more likely to trigger clicks and those recommendation items that are more likely to trigger clicks upon their recommendation.
 This all suggests that leveraging information at both the category and item levels might hold a potential for improving recommender systems. We hope that this work can contribute to the understanding of factors affecting recommender systems. As a future work, we would like to investigate the factors that lead to clicks on recommendation using a larger  dataset and at the content level of the  items.
 % An idea, maybe show the variance of the categories in terms of their CTR? Another thing we can do is to explore the high achieving base itemsand the high achiving recommended itesm and see if they are some how the same items. We also do similar thing with lowe achving base item and recommended items. Is this holds, then clearly it indicates that a big factor is not about the current context, but just the nature of the items themselves, both ion the base items, and in the recommended items.  This is going to gold , as it already shows in the groups. But, we can also zoom in on the politics items and see if that holds too. Another thing we can consider is find base items and recommended items with big variance and study them with the view to finding the causes in terms of categories and also in terms of contenet. The variance of a recommendation item tells us information that is it is recommended to some values it makes sense, but if to others, it does not. This can also be studied at a particlat group's

ref.bib

➞

Show inline comments

@@ @@ -30,3 +30,48 @@ @@
   author={Kille, Benjamin and Lommatzsch, Andreas and Turrin, Roberto and Ser{\'e}ny, Andr{\'a}s and Larson, Martha and Brodt, Torben and Seiler, Jonas and Hopfgartner, Frank},
   year={2015}
+}
 @inproceedings{gabrilovich2004newsjunkie,
   title={Newsjunkie: providing personalized newsfeeds via analysis of information novelty},
   author={Gabrilovich, Evgeniy and Dumais, Susan and Horvitz, Eric},
   booktitle={Proceedings of the 13th international conference on World Wide Web},
   pages={482--490},
   year={2004},
   organization={ACM}
+}
 @inproceedings{jancsary2010towards,
   title={Towards context-aware personalization and a broad perspective on the semantics of news articles},
   author={Jancsary, Jeremy and Neubarth, Friedrich and Trost, Harald},
   booktitle={Proceedings of the fourth ACM conference on Recommender systems},
   pages={289--292},
   year={2010},
   organization={ACM}
+}
 @inproceedings{li2011scene,
   title={Scene: a scalable two-stage personalized news recommendation system},
   author={Li, Lei and Wang, Dingding and Li, Tao and Knox, Daniel and Padmanabhan, Balaji},
   booktitle={Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval},
   pages={125--134},
   year={2011},
   organization={ACM}
+}
 @inproceedings{kulkarni2013all,
   title={All the news that's fit to read: a study of social annotations for news reading},
   author={Kulkarni, Chinmay and Chi, Ed},
   booktitle={Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},
   pages={2407--2416},
   year={2013},
   organization={ACM}
+}
 @inproceedings{gebremeskel2015role,
   title={The Role of Geographic Information in News Consumption},
   author={Gebremeskel, Gebrekirstos G and de Vries, Arjen P},
   booktitle={Proceedings of the 24th International Conference on World Wide Web Companion},
   pages={755--760},
   year={2015}
+}

0 comments (0 inline, 0 general)