Changeset - b3d9215866be
[Not reviewed]
0 1 0
Arjen de Vries (arjen) - 11 years ago 2014-06-12 02:08:49
arjen.de.vries@cwi.nl
a few more minor abstract improvements
1 file changed with 8 insertions and 7 deletions:
0 comments (0 inline, 0 general)
mypaper-final.tex
Show inline comments
 
@@ -102,35 +102,36 @@
 
% Just remember to make sure that the TOTAL number of authors
 
% is the number that will appear on the first page PLUS the
 
% number that will appear in the \additionalauthors section.
 
 
\maketitle
 
\begin{abstract}
 
 
Cumulative citation recommendation refers to the problem faced by
 
knowledge base curators, who need to continuously screen the media for
 
updates regarding the knowledge base entries they manage. Automatic
 
system support for this entity-centric information processing problem
 
requires complex pipe\-lines involving both natural language
 
processing and information retrieval components. The default pipeline
 
processing and information retrieval components. The pipeline
 
encountered in a variety of systems that approach this problem
 
involves four stages: filtering, classification, ranking (or scoring),
 
and evaluation. Filtering is an initial step, that reduces the
 
and evaluation. Filtering is only an initial step, that reduces the
 
web-scale corpus of news and other relevant information sources that
 
may contain entity mentions into a working set of documents that should
 
be more manageable for the subsequent stages.
 
This step has a large impact on the recall that can be achieved.
 
Keeping the subsequent steps constant, we therefore zoom in into the
 
filtering stage, and conduct an in-depth analysis of the main design
 
decisions here:
 
cleansing noisy web data, the methods to create entity profiles, the
 
Nevertheless, this step has a large impact on the recall that can be
 
maximally attained! Therefore, in this study, we have focused on just
 
this filtering stage and conduct an in-depth analysis of the main design
 
decisions here: how to cleans the noisy text obtained online, 
 
the methods to create entity profiles, the
 
types of entities of interest, document type, and the grade of
 
relevance of the document-entity pair under consideration.
 
We analyze how these factors (and the design choices made in their
 
corresponding system components) affect filtering performance.
 
We identify and characterize the relevant documents that do not pass
 
the filtering stage by examing their contents. This way, we give
 
estimate a practical upper-bound of recall for entity-centric stream
 
filtering.  
 
 
\end{abstract}
 
% A category with the (minimum) three required fields
 
\category{H.4}{Information Filtering}{Miscellaneous}
0 comments (0 inline, 0 general)