From 153537134253e564e7e8a9cf99ead6679e875dbb 2014-06-12 05:24:27
From: Gebrekirstos Gebremeskel <destinycome@gmail.com>
Date: 2014-06-12 05:24:27
Subject: [PATCH] mergeMerge branch 'master' of https://scm.cwi.nl/IA/cikm-paper

---

diff --git a/mypaper-final.tex b/mypaper-final.tex
index 13077f056ee452137910c91969c62f7564d2a96f..83b176020f513891dc269b588939ffdf315feaa3 100644
--- a/mypaper-final.tex
+++ b/mypaper-final.tex
@@ -225,7 +225,7 @@ raw and cleaned. The raw and cleansed versions are 6.45TB and 4.5TB
 respectively,  after xz-compression and GPG encryption. The raw data
 is a  dump of  raw HTML pages. The cleansed version is the raw data
 after its HTML tags are stripped off and only English documents
-identified with Chromium Compact Language Detector
+identified with Chromium Compact Language Detector%
 \footnote{\url{https://code.google.com/p/chromium-compact-language-detector/}}
 are included.  The stream corpus is organized in hourly folders each
 of which contains many  chunk files. Each chunk file contains between
@@ -233,8 +233,12 @@ hundreds and hundreds of thousands of serialized  thrift objects. One
 thrift object is one document. A document could be a blog article, a
 news article, or a social media post (including tweet).  The stream
 corpus comes from three sources: TREC KBA 2012 (social, news and
-linking) \footnote{\url{http://trec-kba.org/kba-stream-corpus-2012.shtml}},
-arxiv\footnote{\url{http://arxiv.org/}}, and
+linking)%
+\footnote{\url{http://trec-kba.org/kba-stream-corpus-2012.shtml}%
+},
+arxiv%
+\footnote{\url{http://arxiv.org/}%
+}, and
 spinn3r\footnote{\url{http://spinn3r.com/}}.
 Table \ref{tab:streams} shows the sources, the number of hourly
 directories, and the number of chunk files.