From b3d9215866be132c769193871183fe32be39d156 2014-06-12 02:08:49 From: Arjen P. de Vries Date: 2014-06-12 02:08:49 Subject: [PATCH] a few more minor abstract improvements --- diff --git a/mypaper-final.tex b/mypaper-final.tex index 0e2e11ce283a60bfb16292f3423cd05fd6b5de30..2eba700899bd42dc857617d114790236d49c9fe3 100644 --- a/mypaper-final.tex +++ b/mypaper-final.tex @@ -111,17 +111,18 @@ knowledge base curators, who need to continuously screen the media for updates regarding the knowledge base entries they manage. Automatic system support for this entity-centric information processing problem requires complex pipe\-lines involving both natural language -processing and information retrieval components. The default pipeline +processing and information retrieval components. The pipeline +encountered in a variety of systems that approach this problem involves four stages: filtering, classification, ranking (or scoring), -and evaluation. Filtering is an initial step, that reduces the +and evaluation. Filtering is only an initial step, that reduces the web-scale corpus of news and other relevant information sources that may contain entity mentions into a working set of documents that should be more manageable for the subsequent stages. -This step has a large impact on the recall that can be achieved. -Keeping the subsequent steps constant, we therefore zoom in into the -filtering stage, and conduct an in-depth analysis of the main design -decisions here: -cleansing noisy web data, the methods to create entity profiles, the +Nevertheless, this step has a large impact on the recall that can be +maximally attained! Therefore, in this study, we have focused on just +this filtering stage and conduct an in-depth analysis of the main design +decisions here: how to cleans the noisy text obtained online, +the methods to create entity profiles, the types of entities of interest, document type, and the grade of relevance of the document-entity pair under consideration. We analyze how these factors (and the design choices made in their