Changeset - 72c469e5cb90
[Not reviewed]
0 1 0
Arjen de Vries (arjen) - 11 years ago 2014-06-12 03:39:03
arjen.de.vries@cwi.nl
conflict traces removed
1 file changed with 0 insertions and 5 deletions:
0 comments (0 inline, 0 general)
mypaper-final.tex
Show inline comments
 
@@ -107,55 +107,50 @@
 
\begin{abstract}
 
 
Cumulative citation recommendation refers to the problem faced by
 
knowledge base curators, who need to continuously screen the media for
 
updates regarding the knowledge base entries they manage. Automatic
 
system support for this entity-centric information processing problem
 
requires complex pipe\-lines involving both natural language
 
processing and information retrieval components. The pipeline
 
encountered in a variety of systems that approach this problem
 
involves four stages: filtering, classification, ranking (or scoring),
 
and evaluation. Filtering is only an initial step, that reduces the
 
web-scale corpus of news and other relevant information sources that
 
may contain entity mentions into a working set of documents that should
 
be more manageable for the subsequent stages.
 
Nevertheless, this step has a large impact on the recall that can be
 
maximally attained! Therefore, in this study, we have focused on just
 
this filtering stage and conduct an in-depth analysis of the main design
 
decisions here: how to cleans the noisy text obtained online, 
 
the methods to create entity profiles, the
 
types of entities of interest, document type, and the grade of
 
relevance of the document-entity pair under consideration.
 
We analyze how these factors (and the design choices made in their
 
corresponding system components) affect filtering performance.
 
We identify and characterize the relevant documents that do not pass
 
<<<<<<< HEAD
 
the filtering stage by examining their contents. This way, we give
 
estimate of a practical upper-bound of recall for entity-centric stream
 
=======
 
the filtering stage by examing their contents. This way, we
 
estimate a practical upper-bound of recall for entity-centric stream
 
>>>>>>> 68fbea2f0372ab9b4199b88f980dbf5e97f49063
 
filtering.  
 
 
\end{abstract}
 
% A category with the (minimum) three required fields
 
\category{H.4}{Information Filtering}{Miscellaneous}
 
 
%A category including the fourth, optional field follows...
 
%\category{D.2.8}{Software Engineering}{Metrics}[complexity measures, performance measures]
 
 
\terms{Theory}
 
 
\keywords{Information Filtering; Cumulative Citation Recommendation; knowledge maintenance; Stream Filtering;  emerging entities} % NOT required for Proceedings
 
 
\section{Introduction}
 
In 2012, the Text REtrieval Conferences (TREC) introduced the Knowledge Base Acceleration (KBA) track  to help Knowledge Bases(KBs) curators. The track is crucial to address a critical need of KB curators: given KB (Wikipedia or Twitter) entities, filter  a stream  for relevant documents, rank the retrieved documents and recommend them to the KB curators. The track is crucial and timely because  the number of entities in a KB on one hand, and the huge amount of new information content on the Web on the other hand make the task of manual KB maintenance challenging.   TREC KBA's main task, Cumulative Citation Recommendation (CCR), aims at filtering a stream to identify   citation-worthy  documents, rank them,  and recommend them to KB curators.
 
  
 
   
 
 Filtering is a crucial step in CCR for selecting a potentially
 
 relevant set of working documents for subsequent steps of the
 
 pipeline out of a big collection of stream documents. The TREC
 
 Filtering track defines filtering as a ``system that sifts through
 
 stream of incoming information to find documents that are relevant to
 
 a set of user needs represented by profiles''
 
 \cite{robertson2002trec}. 
0 comments (0 inline, 0 general)