Files
@ 18051291335b
Branch filter:
Location: HCDA/sigir2016repo/main.tex
18051291335b
32.0 KiB
text/x-tex
update
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 | % THIS IS SIGPROC-SP.TEX - VERSION 3.1
% WORKS WITH V3.2SP OF ACM_PROC_ARTICLE-SP.CLS
% APRIL 2009
%
% It is an example file showing how to use the 'acm_proc_article-sp.cls' V3.2SP
% LaTeX2e document class file for Conference Proceedings submissions.
% ----------------------------------------------------------------------------------------------------------------
% This .tex file (and associated .cls V3.2SP) *DOES NOT* produce:
% 1) The Permission Statement
% 2) The Conference (location) Info information
% 3) The Copyright Line with ACM data
% 4) Page numbering
% ---------------------------------------------------------------------------------------------------------------
% It is an example which *does* use the .bib file (from which the .bbl file
% is produced).
% REMEMBER HOWEVER: After having produced the .bbl file,
% % and prior to final submission,
% you need to 'insert' your .bbl file into your source .tex file so as to provide
% ONE 'self-contained' source file.
%
% Questions regarding SIGS should be sent to
% Adrienne Griscti ---> griscti@acm.org
%
% Questions/suggestions regarding the guidelines, .tex and .cls files, etc. to
% Gerald Murray ---> murray@hq.acm.org
%
% For tracking purposes - this is V3.1SP - APRIL 2009
\documentclass{acm_proc_article-sp}
\usepackage{graphicx}
\usepackage{subcaption}
\usepackage{booktabs}
\usepackage{color, colortbl}
\usepackage[utf8]{inputenc}
\usepackage{multirow}
\usepackage[usenames,dvipsnames]{xcolor}
\begin{document}
\title{Towards Explaining Clicks on Recommendations}
% You need the command \numberofauthors to handle the 'placement
% and alignment' of the authors beneath the title.
%
% For aesthetic reasons, we recommend 'three authors at a time'
% i.e. three 'name/affiliation blocks' be placed beneath the title.
%
% NOTE: You are NOT restricted in how many 'rows' of
% name/affiliations may appear. We just ask that you restrict
% the number of 'columns' to three.
%
% Because of the available 'opening page real-estate'
% we ask you to refrain from putting more than six authors
% (two rows with three columns) beneath the article title.
% More than six makes the first-page appear very cluttered indeed.
%
% Use the \alignauthor commands to handle the names
% and affiliations for an 'aesthetic maximum' of six authors.
% Add names, affiliations, addresses for
% the seventh etc. author(s) as the argument for the
% \additionalauthors command.
% These 'additional authors' will be output/set for you
% without further effort on your part as the last section in
% the body of your article BEFORE References or any Appendices.
% \numberofauthors{8} % in this sample file, there are a *total*
% of EIGHT authors. SIX appear on the 'first-page' (for formatting
% reasons) and the remaining two appear in the \additionalauthors section.
%
\author{
% You can go ahead and credit any number of authors here,
% e.g. one 'row of three' or two rows (consisting of one row of three
% and a second row of one, two or three).
%
% The command \alignauthor (no curly braces needed) should
% precede each author name, affiliation/snail-mail address and
% e-mail address. Additionally, tag each line of
% affiliation/address with \affaddr, and tag the
% e-mail address with \email.
%
% 1st. author
% \alignauthor
% Ben Trovato\titlenote{Dr.~Trovato insisted his name be first.}\\
% \affaddr{Institute for Clarity in Documentation}\\
% \affaddr{1932 Wallamaloo Lane}\\
% \affaddr{Wallamaloo, New Zealand}\\
% \email{trovato@corporation.com}
% % 2nd. author
% \alignauthor
% G.K.M. Tobin\titlenote{The secretary disavows
% any knowledge of this author's actions.}\\
% \affaddr{Institute for Clarity in Documentation}\\
% \affaddr{P.O. Box 1212}\\
% \affaddr{Dublin, Ohio 43017-6221}\\
% \email{webmaster@marysville-ohio.com}
% % 3rd. author
% \alignauthor Lars Th{\o}rv{\a}ld\titlenote{This author is the
% one who did all the really hard work.}\\
% \affaddr{The Th{\o}rv{\a}ld Group}\\
% \affaddr{1 Th{\o}rv{\a}ld Circle}\\
% \affaddr{Hekla, Iceland}\\
% \email{larst@affiliation.org}
% \and % use '\and' if you need 'another row' of author names
% % 4th. author
% \alignauthor Lawrence P. Leipuner\\
% \affaddr{Brookhaven Laboratories}\\
% \affaddr{Brookhaven National Lab}\\
% \affaddr{P.O. Box 5000}\\
% \email{lleipuner@researchlabs.org}
% % 5th. author
% \alignauthor Sean Fogarty\\
% \affaddr{NASA Ames Research Center}\\
% \affaddr{Moffett Field}\\
% \affaddr{California 94035}\\
% \email{fogartys@amesres.org}
% % 6th. author
% \alignauthor Charles Palmer\\
% \affaddr{Palmer Research Laboratories}\\
% \affaddr{8600 Datapoint Drive}\\
% \affaddr{San Antonio, Texas 78229}\\
% \email{cpalmer@prl.com}
}
% There's nothing stopping you putting the seventh, eighth, etc.
% author on the opening page (as the 'third row') but we ask,
% for aesthetic reasons that you place these 'additional authors'
% in the \additional authors block, viz.
% \additionalauthors{Additional authors: John Smith (The Th{\o}rv{\a}ld Group,
% email: {\texttt{jsmith@affiliation.org}}) and Julius P.~Kumquat
% (The Kumquat Consortium, email: {\texttt{jpkumquat@consortium.net}}).}
% \date{30 July 1999}
% Just remember to make sure that the TOTAL number of authors
% is the number that will appear on the first page PLUS the
% number that will appear in the \additionalauthors section.
\maketitle
%opening
\title{Items that trigger clicks on recommendation}
\author{}
\maketitle
\begin{abstract}
In a setting where recommendations are provided to a user when the user is viewing a particular item, what are the factors that contribute to clicks on the recommendations? We examine whether a click on a recommendation is a function of the item the user is reading at the time of the recommendation, or of the recommended item itself. More specifically, we investigate whether some items are more likely to cause the user to click on recommendations, and some recommendations are more likely to be clicked. Our investigation of the factors is done at the level of the categories of items and the transitions between the categories. We find that the categories of items play a big role in clicks on recommendations, and that they hold potential for a category-level personalization.
\end{abstract}
\section{Introduction}
For recommender systems to deliver good recommendations, they need to incorporate different factors into the determination of recommendation items. This means understanding several factors that influence the recommender system's success. These factors can be broadly categorized into non-content and content factors. Non-content factors, include, among others, the user's current context, social media annotations and recommendation by friends. Social media annotations, recommendation by branded companies and friend annotations and recommendations \cite{kulkarni2013all} increase both user's consumption and satisfaction. Content and non-content factors are equally important in influencing the user's decision to read news items \cite{jancsary2010towards}. There is also a study that shows that geographic relevance affects the consumption and recommendation of news items \cite{gebremeskel2015role}.
Content factors are those factors of the items that are modeled by key-words and named entities \cite{gabrilovich2004newsjunkie}, and topics and categories \cite{li2011scene}.
One study that investigated responses to recommendations in different online publishers found that traditional news portals providing news and opinions on politics and current events are more likely to generate clicks on recommendations than special interest portals such as sports, gardening, and auto mechanic forums \cite{said2013month}. Another study \cite{esiyok2014users} investigated the reading transition of users at the level of the category of items of one of the traditional news portals - Tagesspiegel (a popular German news portal) and reported that there is a relationship between what the user is currently reading and what they read next. Specially interesting finding of this study was that the categories local and sports received the most loyal readers, that is, that a user reading on one of these categories will more likely keep reading items of the same category.
While both studies \cite{said2013month,esiyok2014users} are very related and relevant to our interest in factors that contribute to clicks, they did not investigate the transitions between the base items and the recommended items. In a recommendation setting where recommendation items are provided to a user on the item that the user is currently viewing (henceforth referred to as base item), what are the factors that trigger users to click on the recommendations? Are the clicks a function of the base items or of the recommended items? Do some base items and some recommended items cause users to click on recommendations more than others, and if they do, what explains this difference? In this study we investigate these factors using the categories of the base items and the recommended items, and the transitions between them.
The study contributes to the understanding of factors that influence recommendation systems. The insights % help 1) to understand the influence of the categories of the base-itmes and recommended-itmes in the clicks on recommendations, and 2) to point recommender systems target those items that generate clicks and to ignore those that do not.
\begin{table*}
\caption{A sample of the dataset. \label{tab:sample}}
\centering
\begin{tabular}{|l|l|l|l|l|l|l|}
\hline
Base Item & Base Item Category & Recommendation& Recommendation Category &View&Click & CTR \\
\hline
229397219 & Berlin & 229495114 & Berlin & 17 & 1 & 5.88\\
230306628 & politics & 230291175& wissen & 14 & 1 & 7.14\\
40485126 & Berlin & 225589114 & politics & 2 & 0 & 0.00\\
\hline
\end{tabular}
\end{table*}
\section{Dataset}
%We used a dataset of user-item interactions on Tagesspiegel, an online German news portal.
We aggregated user-item interaction data from the logs of the recommender systems that we used during our participation in the CLEF NewsREEL 2015 challenge \cite{kille2015overview}. This challenge offered participants the opportunity to plug their recommendation algorithms to Plista\footnote{http://orp.plista.com/documentation} and provide recommendations to real users visiting online publishers. Plista is a recommendation framework that connects recommendation providers such as ourselves and recommendation service requesters such as online news portals. Participation in the challenge enabled us to collect information of user-item interaction such as impressions (viewing of items by users) and clicks (a user clicking on recommended items).
The three recommendation algorithms that we used are two instances of \textbf{Recency}, and one instance of \textbf{RecencyRandom}. The Recency algorithm keeps the most recently viewed or updated items and recommends the top $\mathit{k}$ most recent items every time a recommendation request is made. The RecencyRandom recommender keeps the most recent $\mathit{100}$ items at any time and recommends, randomly, the requested number of items every time a recommendation request is made.
For this analysis, we focused on user-item interactions on Tagesspiegel, one of the biggest German online news portals. The interaction dataset was collected from 15 April,2015 to 04 July,2015. Items in Tagesspiegel are manually placed by the journalists under categories. For our study, we investigated $\mathit{9}$ categories: \textbf{politics (politik)}, \textbf{business (wirtschaft)}, \textbf{sports (sport)}, \textbf{culture (kultur)}, \textbf{world (weltspiegel)}, \textbf{opinion (meinung)}, \textbf{media (medien)}, \textbf{education (wissen)} and the local category \textbf{berlin}.
Click information provided by the Plista platform does not directly show whether the click information received is in response to our recommendations or to some other participants' recommendations. Since we know the user and the base item for which we recommended and the recommended items, we considered a click notification on one of our recommended items as a click on our recommendation, if that click happened with in $\mathit{5}$ minutes from the time of our recommendation. From the combined collected dataset, we extracted the base item, the category of the base item, the recommended item, the category of the recommended item, the number of times a recommendation item has been recommended to a base item (view) and the number of times that the recommended item has been clicked from the base item. From the views and clicks, we compute click-through-rate (CTR) as the percentage of views that are clicked. A sample of the extracted dataset is presented in Table \ref{tab:sample}.
%Plista is a company that provides a recommendation platform where recommendation providers are linked with online publishers in need of recommendation sertvice.
% It is not easy to get the exact number of times a recommendation item is recommended to a certain base item since the logs did not include wWe assume that the number of times a base item has been viewed as the number of times recommendations were shown. We assume this to be a fair assumption as recommendation were sought each time a an item was viewed by a user. % Although each time an item is viewed, more than one item (usually 5 items) are shown to the user as recommendations, we just count the number of clicks that have happened from those items
% regardless of which items are clicked.
\section{Results and Analysis}
Our dataset consists of a total of $\mathit{288979}$ base-item \\recommendation-item pairs. To see the relationship between \textbf{views} and \textbf{clicks}, we first sorted the dataset according to \textbf{views} and then normalized the \textbf{view} and \textbf{click} counts by the total number of views and the total number of clicks, respectively. We then selected the top $\mathit{1000}$ pairs and plotted the views and the clicks. The reason for normalization is to be able to plot them together for easy comparison. %The selection of only $\mathit{1000}$ pairs is because the more items we use, the more difficult is to see .
Figure \ref{fig:view_click} shows the plot of views and clicks for the $\mathit{1000}$ pairs. The blue plot is for views and is smooth since the data was sorted by views. The red plot is for the corresponding clicks on recommendations. We observe that the clicks do not follow the views, an indication that the clicks do not correspond with the number of times that a recommendation item is recommended to a base item. This observation is the primary reason we set out to investigate, that is, the discrepancy between the views and clicks, to begin with. The ragged click plot shows that some items are more likely to trigger clicks on recommendations than others. What can possibly explain this observation? What causes these difference in CTR scores for the various items?
\begin{figure} [t]
\centering
\includegraphics[width=7cm, height=5cm]{img/tage_view_click1000-crop.pdf}
\caption{Plots of views (blue) and clicks (red). The Plots are generated by first sorting by views. The difference between the view and click plots suggests that some items are more likely to trigger clicks on recommendation than others. \label{fig:view_click}}
\end{figure}
%
% \begin{figure} [t]
% \centering
% \includegraphics[scale=0.5]{img/tage_view100.pdf}
%
% \label{fig:view100}
% \caption{Plot of the most viewed 100 items}
% \end{figure}
% \begin{figure} [t]
% \centering
% \includegraphics[scale=0.5]{img/tage_click100.pdf}
%
% \label{fig:click100}
% \caption{Plot of the clicks triggered from the 100 most viewed items}
% \end{figure}
%
%
\subsection{Item Categories}
To start to explain the difference between the view plot and the click plot observed in Figure \ref{fig:view_click}, we aggregated views and clicks by the $\mathit{9}$ categories of items that the items are placed under in the Tagesspiegel website. The aggregation gives us two results: view counts and click counts of the base-item categories and the recommended-item categories. With the categories, we attempt to answer two questions: 1) are there differences between the base categories in triggering clicks on recommendations?, and 2) are there differences between the recommendation categories in triggering clicks upon their recommendation? Tables \ref{tab:base} and \ref{tab:reco} present the views, clicks and CTR scores. The results are sorted by CTR scores.
We observe a difference between the base categories and the recommendation categories with respect to the likelihood of triggering clicks. In the base categories, The category \textbf{Politics} is more likely to trigger clicks than other categories, followed by \textbf{Opinion} and \textbf{World}. Special categories such as \textbf{Culture} and and \textbf{Education} are the least likely to trigger clicks on recommendations. This is consistent with the previous findings that reported that special interest portals generate less clicks on recommendations than traditional portals providing news, opinions and current events.
\begin{table*}
\caption{The views, clicks, and CTR scores of the categories . Table \ref{tab:base} is for the base categories and Table \ref{tab:reco} is for the recommendation categories. The CTR scores are generally higher in recommendation, and the ranking of the categories in terms of the CTR scores are different in base and in recommendation. }
\parbox{.45\linewidth}{
\centering
\begin{tabular}{|l|l|l|l|l|}
\hline
Category & Views & Clicks & CTR (\%)\\
\hline
Politik (Politics)&73197&178&0.24\\
Medien (Media)&22426&50&0.22\\
Weltspiegel (World)&37413&77&0.21\\
Wirtschaft (Business)&30045&61&0.2\\
Sport (sports)&29812&58&0.19\\
Berlin&123595&129&0.1\\
Meinung (Opinion)&4611&3&0.07\\
Kultur (Culture)&21840&11&0.05\\
Wissen (Education)&13500&4&0.03\\
\hline
\end{tabular}
\subcaption{Base Category \label{tab:base}}
}
\hfill
\parbox{.45\linewidth}{
\centering
\begin{tabular}{|l|l|l|l|}
\hline
category & Views & Clicks & CTR (\%)\\
\hline
Medien (Media)&22147&68&0.31\\
Politik (Politics)&68230&170&0.25\\\
Berlin&123559&188&0.15\\
Weltspiegel (World)&37535&58&0.15\\
Sport (Sports)&28160&36&0.13\\
Meinung (Opinion)&4925&5&0.1\\
Kultur (Culture)&23278&21&0.09\\
Wissen (Education)&15650&10&0.06\\
Wirtschaft (Business)&32955&15&0.05\\
\hline
\end{tabular}
\subcaption{Recommendation Category \label{tab:reco}}
}
\end{table*}
On the recommendation side, however, it is the category \textbf{Media} that is more likely to incur clicks upon recommendation, followed by \textbf{Politics} and the local category (\textbf{Berlin)}. The two least performing categories are \textbf{Business} and \textbf{Education}, similar to the least performing categories in base. So, overall, it seems that the likelihood of triggering clicks by the categories shows a difference when they are in base or in recommendation. In general, the categories have higher CTR scores in recommendation than in base.
To gain further insight, we looked at the CTRs of the transitions from base categories to recommendation categories. The aim of this is to find out whether some base categories are more likely to trigger clicks on some recommendation categories. The results are presented in Table \ref{heatmap}.
Some interesting observations can be seen in the category-to-category transitions. The highest transition CTRs for the base categories of \textbf{Berlin} and \textbf{Politics} are to \textbf{Media}, for \textbf{Business}, it is to \textbf{Opinion}, for \textbf{Sports} it is to \textbf{sports}. The highest transition CTR for \textbf{Culture} is to the local category (\textbf{berlin}), and for \textbf{World} it is to \textbf{Politics} followed by to \textbf{Berlin}. \textbf{Media} is the most likely to trigger clicks upon recommendation. The local category \textbf{Berlin} is the one that is more likely to trigger clicks on diverse recommendation categories.
\begin{table*}
\centering
\caption{Transition CTR scores from base categories to recommendation categories. The row categories represent the categories of base items and the column categories represent the recommendation categories. \label{heatmap}}
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|}
\hline
&Berlin&Politik&Wirtschaft&Sport&Kultur&Weltspiegel&Meinung&Medien&Wissen\\
\hline
Berlin&0.14&0.08&0.06&0.05&0.06&0.12&0.12&0.16&0.06\\
Politik&0.2&\textbf{0.39}&0.06&0.12&0.04&0.3&0&\textbf{0.73}&0.1\\
Wirtschaft&0.15&\textbf{0.4}&0.07&0.13&0.36&0.13&\textbf{0.46}&0.21&0\\
Sport&0.14&0.27&0&\textbf{0.68}&0.05&0.18&0&0.27&0.07\\
Kultur&0.11&0&0&0.06&0.07&0&0&0.07&0\\
Weltspiegel&0.24&0.27&0.06&0.13&0.17&0.13&0&\textbf{0.4}&0.18\\
Meinung&0.06&0&0&0&0&0&\textbf{0.85}&0.32&0\\
Medien&0.1&\textbf{0.85}&0&0.06&0&0.08&0&0.16&0\\
Wissen&0.02&0&0&0&0.11&0.15&0&0&0\\
\hline
\end{tabular}
\end{table*}
% For example if we look at the category of politics , we see that the CTR from politics to politics is the highest than from politics to any other category. We also observe that the CTR from local category Berlin to politics is higher than from the local category Berlin to any other category including to itself. A little surprising result is the high CTR from media to politics.
% The way we extracted our recommendations and clciks is a little uncertan. In the Plista setting, when click results are reported to users, they are not known whose recommendations are being clicked. So while we know our recommendation, we do not know for sure how much of the click notifications that we recieve belong to our recommendations. To extract our clciks, we introduced a time frame of 5 minutes. That is if the click notification happens in with in a range of time, in our case 5 minutes, we consider the clcik is on our recommendations. We consider the click information is a bit inflated for users might not stay for more than 5 minutes. While the actual CTR might be a bit inflated as a result of the inflated number of clicks, we consider the relative scores as indicative of the true difference.
% To find out therelationship between base item recommendation pairs that resulted in high CTR scoores, we selected some item-recommendations pairs. To avoid selecting item-recommendation pairs that have very low views and clicks which is usually the type of combination that results in high CTR scores, we first sort our data according to views, and according to clicks. Using cutt off values, we repeat the intersection until we find the items that have both the highest view and the hight clicks. Using this approach we selected 12 item-recommendation pairs and out of them we selected the 5 pairs that have the highest score. These pairs are presented in Table \ref{}
\subsection{Item-level Base and Recommendation CTRs}
% We look at the two types of item-level CTR's:the base item CTRs and the recommendation CTRs. The base item CTR measures how likely the base item is to trigger clicks on recommendation. We assume that part if clicking on recommendations is a function of the item the user is reading. this is corroborated by the category-level CTr's that we looked at above in thesense that some categories do not generate clicks. even if the item are from clickable categories. The recommendation CTR's ameasures how likely the item is to recieve a click when recomened to a user regardless of the category of the base item. But, should we not be concerned about the base item?
% We plan to extract a sample of base items with recommended and clicked items and separate them into clicked and rejected recommendations. We then compare the contenet of the clicked items with the contenet of the base item. We also do the same with the rejected items and see if there is any similarities/differences bertween these two categories. The sepration of clicked and rejected items and comparing them to the base item is similar to the sepration of recommended moviews into viwed and ignored in \cite{nguyen2014exploring}.
%
% On the same dataset, there has been a study on the transition probababilities of users on the categories This study was on genral reading. In this study 1) we repeat the same study on a dataset from a different time and 2) we analyze results in terms of similarity of content with the base items.
%
%
% Question for myself: Is it maybe possible to compute the category CTR's? Like a hitmap of the CTRs where the recommendations are subsidvided to their categories and a CTR is computed? I think so. We can also go durther and look at the contenet similarities. Further, we can look at what type of items trigger more clicks by selecting some items which generated more clicks and analyzing them.
At the item level, we investigated whether %re is a relationship, in triggering clicks on recommendations, between the base items and the recommended items. More specifically, are
the base items that are more likely to trigger recommendation are also the ones that are more likely to be clicked upon recommendations. To accomplish this, we first computed the CTRs for base items and recommendation items separately creating two datasets of item-CTR score. We then intersected the two datasets by the items to find the items that exist both as base-items and recommended-items. It is important to state here that we have more recommended-items than base items. This is because we are only requested to provide recommendations to some items via the Plista platform, while we could choose from all items for recommendation. Our dataset collected over two month comprises $\mathit{55708}$ recommended-items and $\mathit{18967}$ base items. The intersection resulted in $\mathit{15221}$ items.
To better visualize the results, we present two plots. In Figure \ref{fig:view_click_base}, we present plots generated by sorting the results by base CTR. The blue plot is for base CTR and the red plot is for recommendation CTR. What we observe here is that the base items that are more likely to trigger clicks on recommendations are mostly also the items that are more likely to trigger clicks upon their recommendations (bottom left of the plot).There are, however many items that are more likely to trigger clicks upon their recommendation, but they do not do so as base items (bottom right of the plot). To see this from the other angle, we also sorted the results by recommendation CTR, plotted them again, and we obtained the plots in Figure \ref{fig:view_click_reco}. We observe here that the base items (the blue line) that are more likely to trigger clicks on recommendation are a subset of the recommendation items that are more likely to trigger clicks upon their recommendation. So from the overlap in the plots, we can conclude that for most of the items their ability to trigger clicks on
recommendation as base items is indicative of their attractiveness as recommendation items. It seems, however, not the case that the ability to incur clicks upon recommendation is indicative of the ability to trigger clicks as a base item. %The discrepancy we observe might have to do with the fact that we had a limited access to base items while we have a full access to the items for recommendation.
\begin{figure} [t]
\centering
\includegraphics[width=7cm, height=5cm]{img/base_reco_ctr_sorted_by_base-crop.pdf}
\caption{CTRs of base items (blue) and of recommended items (red) generated by first sorting by base CTR. The high-scoring recommendation items do not follow the high-scoring base items. \label{fig:view_click_base}}
\end{figure}
\begin{figure} [t]
\centering
\includegraphics[width=7cm, height=5cm]{img/base_reco_ctr_sorted_by_reco-crop.pdf}
\caption{CTRs of base items (blue) and of recommended items (red) generated by first sorting by recommendation CTR. The high-scoring base items are mostly a subset of the high-scoring recommendation items. \label{fig:view_click_reco}}
\end{figure}
\section{Discussion and Conclusion}
In this study, we attempted to explain the factors that trigger clicks on recommendations. We specifically investigated whether clicks on recommendations are a function of the base-items, or of the recommended-items. We attempted to explain that by looking at the categories of items and the transitions between them. We found that indeed the category of the items explains some of the discrepancy between the likelihoods of triggering clicks both as base items and recommendation items in the sense that some base categories and some recommendation categories are more likely to trigger clicks than others.
% There is, however, a difference between the categories in their likelihood to trigger clicks as base category and as recommendation category. As base category, the politics category is the most likely to trigger clicks on recommendations followed by media. In recommendations, however, it is the media followed by politics that trigger clicks upon their recommendation.
The results suggest that click on recommendation is a function of both the base-items and the recommended-items. This is indicated by the fact that some categories are less or more likely to generate clicks on recommendation whether as base or as recommendation. This suggests that leveraging category information holds a potential for improving the performance of a recommender system. The results also show that the performance of the categories as base and recommendation are not exactly aligned. This non-alignment was also observed at the item-level in that there were many items that were more likely to trigger clicks as recommendation, but not as base.
The investigation of the transitions between categories suggests that recommendation can be improved by recommending some categories to those categories where they are more likely to get clicked. For example, we observe that it is more likely to receive clicks if we recommend media items to the politics category and to the local category (berlin). Similarly, we observe that recommending sports items to sports items is much more likely to trigger clicks than recommending other categories. %These results suggest that there is a way to improve recommender system by leveraging category information of items.
We have shown that category-level information can take us a long way in explaining a clicks on a recommendations.
% Item level information also showed that there is a relationship between base items that are more likely to trigger clicks and those recommendation items that are more likely to trigger clicks upon their recommendation.
This all suggests that leveraging information at both the category and item levels might hold a potential for improving recommender systems. We hope that this work can contribute to the understanding of factors affecting recommender systems. As a future work, we would like to investigate the factors that lead to clicks on recommendation using a larger dataset and at the content level of the items.
% An idea, maybe show the variance of the categories in terms of their CTR? Another thing we can do is to explore the high achieving base itemsand the high achiving recommended itesm and see if they are some how the same items. We also do similar thing with lowe achving base item and recommended items. Is this holds, then clearly it indicates that a big factor is not about the current context, but just the nature of the items themselves, both ion the base items, and in the recommended items. This is going to gold , as it already shows in the groups. But, we can also zoom in on the politics items and see if that holds too. Another thing we can consider is find base items and recommended items with big variance and study them with the view to finding the causes in terms of categories and also in terms of contenet. The variance of a recommendation item tells us information that is it is recommended to some values it makes sense, but if to others, it does not. This can also be studied at a particlat group's
\bibliographystyle{abbrv}
\bibliography{ref}
\end{document}
|