From 60fbfbab0287ab72519987bdcba3adb5a0aa93c8 2014-06-12 02:03:44 From: Arjen P. de Vries Date: 2014-06-12 02:03:44 Subject: [PATCH] abstract new --- diff --git a/mypaper-final.tex b/mypaper-final.tex index 7ac4935589efe246eca243819f6b949f2e79cbd3..0e2e11ce283a60bfb16292f3423cd05fd6b5de30 100644 --- a/mypaper-final.tex +++ b/mypaper-final.tex @@ -106,9 +106,30 @@ \maketitle \begin{abstract} - -Entity-centric information processing requires complex pipelines involving both natural language processing and information retrieval components. In entity-centric stream filtering and ranking, the pipeline involves four important stages: filtering, classification, ranking(scoring) and evaluation. Filtering is an important step that creates a manageable working set of documents from a web-scale corpus for the next stages. It thus determines the performance of the overall system. Keeping the subsequent steps constant, we zoom in on the filtering stage and conduct an in-depth analysis of the main components of cleansing, entity profiles, relevance levels, category of documents and entity types with a view to understanding the factors and choices that affect filtering performance. The study demonstrates the most effective entity profiling, identifies those relevant documents that defy filtering and conducts manual examination into their contents. The paper classifies the ways unfilterable documents -are mentioned in text and estimates the practical upper-bound of recall in entity-based filtering. +Cumulative citation recommendation refers to the problem faced by +knowledge base curators, who need to continuously screen the media for +updates regarding the knowledge base entries they manage. Automatic +system support for this entity-centric information processing problem +requires complex pipe\-lines involving both natural language +processing and information retrieval components. The default pipeline +involves four stages: filtering, classification, ranking (or scoring), +and evaluation. Filtering is an initial step, that reduces the +web-scale corpus of news and other relevant information sources that +may contain entity mentions into a working set of documents that should +be more manageable for the subsequent stages. +This step has a large impact on the recall that can be achieved. +Keeping the subsequent steps constant, we therefore zoom in into the +filtering stage, and conduct an in-depth analysis of the main design +decisions here: +cleansing noisy web data, the methods to create entity profiles, the +types of entities of interest, document type, and the grade of +relevance of the document-entity pair under consideration. +We analyze how these factors (and the design choices made in their +corresponding system components) affect filtering performance. +We identify and characterize the relevant documents that do not pass +the filtering stage by examing their contents. This way, we give +estimate a practical upper-bound of recall for entity-centric stream +filtering. \end{abstract} % A category with the (minimum) three required fields