HCDA/cikm-paper Changeset - 60fbfbab0287 · Centrum Wiskunde & Informatica (CWI)

Changeset - 60fbfbab0287

Parent rev.

Child rev.

[Not reviewed]

0 1 0

Arjen de Vries (arjen) - 11 years ago 2014-06-12 02:03:44
arjen.de.vries@cwi.nl

abstract new

1 file changed with 24 insertions and 3 deletions:

mypaper-final.tex

0 comments (0 inline, 0 general)

mypaper-final.tex

➞

Show inline comments

@@ @@ -106,9 +106,30 @@ @@
 \maketitle
 \begin{abstract}
 Entity-centric information processing requires complex pipelines involving both natural language processing and information retrieval components. In entity-centric stream filtering and ranking, the pipeline involves four  important stages: filtering, classification, ranking(scoring)  and evaluation. Filtering is an important step  that creates a manageable working set of documents  from a  web-scale corpus for the next stages.  It thus  determines the performance of the overall system.  Keeping the subsequent steps constant, we  zoom in on the filtering stage and conduct an in-depth analysis of the  main components of cleansing, entity profiles, relevance levels, category of documents and entity types with a view to understanding  the factors and choices that affect filtering performance. The study demonstrates the most  effective entity profiling,  identifies those relevant documents that defy filtering and conducts manual examination into their contents. The paper classifies the ways unfilterable documents
 are mentioned in text and estimates the practical upper-bound of recall in  entity-based filtering.
 Cumulative citation recommendation refers to the problem faced by
 knowledge base curators, who need to continuously screen the media for
 updates regarding the knowledge base entries they manage. Automatic
 system support for this entity-centric information processing problem
 requires complex pipe\-lines involving both natural language
 processing and information retrieval components. The default pipeline
 involves four stages: filtering, classification, ranking (or scoring),
 and evaluation. Filtering is an initial step, that reduces the
 web-scale corpus of news and other relevant information sources that
 may contain entity mentions into a working set of documents that should
 be more manageable for the subsequent stages.
 This step has a large impact on the recall that can be achieved.
 Keeping the subsequent steps constant, we therefore zoom in into the
 filtering stage, and conduct an in-depth analysis of the main design
 decisions here:
 cleansing noisy web data, the methods to create entity profiles, the
 types of entities of interest, document type, and the grade of
 relevance of the document-entity pair under consideration.
 We analyze how these factors (and the design choices made in their
 corresponding system components) affect filtering performance.
 We identify and characterize the relevant documents that do not pass
 the filtering stage by examing their contents. This way, we give
 estimate a practical upper-bound of recall for entity-centric stream
 filtering.
 \end{abstract}
 % A category with the (minimum) three required fields

0 comments (0 inline, 0 general)