Changeset - b3d9215866be
[Not reviewed]
0 1 0
Arjen de Vries (arjen) - 11 years ago 2014-06-12 02:08:49
arjen.de.vries@cwi.nl
a few more minor abstract improvements
1 file changed with 8 insertions and 7 deletions:
0 comments (0 inline, 0 general)
mypaper-final.tex
Show inline comments
 
@@ -111,17 +111,18 @@ knowledge base curators, who need to continuously screen the media for
 
updates regarding the knowledge base entries they manage. Automatic
 
system support for this entity-centric information processing problem
 
requires complex pipe\-lines involving both natural language
 
processing and information retrieval components. The default pipeline
 
processing and information retrieval components. The pipeline
 
encountered in a variety of systems that approach this problem
 
involves four stages: filtering, classification, ranking (or scoring),
 
and evaluation. Filtering is an initial step, that reduces the
 
and evaluation. Filtering is only an initial step, that reduces the
 
web-scale corpus of news and other relevant information sources that
 
may contain entity mentions into a working set of documents that should
 
be more manageable for the subsequent stages.
 
This step has a large impact on the recall that can be achieved.
 
Keeping the subsequent steps constant, we therefore zoom in into the
 
filtering stage, and conduct an in-depth analysis of the main design
 
decisions here:
 
cleansing noisy web data, the methods to create entity profiles, the
 
Nevertheless, this step has a large impact on the recall that can be
 
maximally attained! Therefore, in this study, we have focused on just
 
this filtering stage and conduct an in-depth analysis of the main design
 
decisions here: how to cleans the noisy text obtained online, 
 
the methods to create entity profiles, the
 
types of entities of interest, document type, and the grade of
 
relevance of the document-entity pair under consideration.
 
We analyze how these factors (and the design choices made in their
0 comments (0 inline, 0 general)