HCDA/cikm-paper Changeset - 16cf454c73fa · Centrum Wiskunde & Informatica (CWI)

Changeset - 16cf454c73fa

Parent rev.

Child rev.

[Not reviewed]

Merge

0 2 0

Gebrekirstos Gebremeskel - 11 years ago 2014-06-11 20:00:40
destinycome@gmail.com

Merge branch 'master' of https://scm.cwi.nl/IA/cikm-paper

1 file changed with 23 insertions and 3 deletions:

mypaper-final.tex

0 comments (0 inline, 0 general)

mypaper-final.tex

➞

Show inline comments

@@ @@ -102,9 +102,29 @@ @@
 \maketitle
 \begin{abstract}
 Entity-centric information processing requires complex pipelines involving both natural language processing and information retrieval components. In entity-centric stream filtering and ranking, the pipeline involves four  important stages: filtering, classification, ranking(scoring)  and evaluation. Filtering is an important step  that creates a manageable working set of documents  from a  web-scale corpus for the next stages.  It thus  determines the performance of the overall system.  Keeping the subsequent steps constant, we  zoom in on the filtering stage and conduct an in-depth analysis of the  main components of cleansing, entity profiles, relevance levels, category of documents and entity types with a view to understanding  the factors and choices that affect filtering performance. The study demonstrates the most  effective entity profiling,  identifies those relevant documents that defy filtering and conducts manual examination into their contents. The paper classifies the ways unfilterable documents
 are mentioned in text and estimates the practical upper-bound of recall in  entity-based filtering.
 Entity-centric information processing requires complex pipelines
 involving both natural language processing and information retrieval
 components. In entity-centric stream filtering and ranking, the
 pipeline involves four stages: filtering, classification,
 ranking (scoring) and evaluation. Filtering is an initial step, that
 extracts a working-set of documents from the web-scale corpus, aiming
 for a smaller size collection that would be more manageable in the
 subsequent stages of the pipeline. This filtering step therefore
 determines the maximally attainable performance of the overall system.
 This paper investigates the filtering stage in isoltation, in context
 of a cumulative citation recommendation problem. We conduct an
 in-depth analysis of the main factors that determine filtering
 effectiveness: cleansing noisy web data, methods to create entity
 profiles, the types of entity of interest, document category, and the
 relevance level of the entity-document pair under consideration.
 We analyze how these factors (and the design choices made in their
 corresponding system components) affect filtering performance.
 We identify and characterize the relevant documents that do not pass the
 filtering stage, and conduct a manual examination into their
 contents. The paper classifies the ways unfilterable documents
 are mentioned in text and estimates the practical upper-bound of
 recall in entity-based filtering.
 \end{abstract}
 % A category with the (minimum) three required fields

0 comments (0 inline, 0 general)