Changeset - d9b84600c510
[Not reviewed]
0 1 0
Arjen de Vries (arjen) - 11 years ago 2014-06-12 05:48:34
arjen.de.vries@cwi.nl
conclusions "done"
1 file changed with 7 insertions and 4 deletions:
0 comments (0 inline, 0 general)
mypaper-final.tex
Show inline comments
 
@@ -1110,57 +1110,60 @@ We observed that there are vital-relevant documents that we miss from raw only,
 
\paragraph*{head - organization} A document that talks about an organization of which the entity is the head can be vital for the entity.  Jasper\_Schneider is USDA Rural Development state director for North Dakota and an article about problems of primary health centers in North Dakota is judged vital for him. 
 
\paragraph*{World Knowledge} Some things are impossible to know without your world knowledge. For example ''refreshments, treats, gift shop specials, "bountiful, fresh and fabulous holiday decor," a demonstration of simple ways to create unique holiday arrangements for any home; free and open to the public`` is judged relevant to Hjemkomst\_Center. This is a social media post, and unless one knows the person posting it, there is no way that this text shows that. Similarly ''learn about the gray wolf's hunting and feeding behaviors and watch the wolves have their evening meal of a full deer carcass; $15 for members, $20 for nonmembers`` is judged vital to Red\_River\_Zoo.  
 
\paragraph*{No document content} A small number of documents were found to have no content.
 
\paragraph*{Disagreement} For a few remaining documents, the authors disagree with the assessors as to why these are vital to the entity.
 
 
 
 
\section{Conclusions} \label{sec:conc}
 
In this paper, we examined the filtering stage of the entity-centric
 
stream filtering and ranking  by holding the later stages of fixed. In
 
particular, we studied the cleansing step, different techniques to
 
construct entity profiles, and the effects of entity type (Wikipedia
 
or Twitter) and document category (news, social, or other). We attempted to address
 
the following research questions: 1) does cleansing affect filtering
 
and subsequent performance? 2) what is the most effective way of
 
entity profiling? 3) is filtering different for Wikipedia and Twitter
 
entities? 4) are some type of documents easily filterable and others
 
not? 5) does a gain in recall at filtering step translate to a gain in
 
max-F at the end of the pipeline? and 6) what are the
 
circumstances under which vital documents can not be retrieved?
 
 
Cleansing may remove (parts of) the contents of documents, making
 
them irretrievable. However, because of the introduction of false
 
positives, gaining recall by filtering the raw corpus instead of the
 
cleansed one and developing richer entity profiles, does not necessarily translate to overall
 
performance gains. The overall conclusion on this is mixed in the
 
cleansed one, as well as developing richer entity profiles, does not necessarily translate to overall
 
performance gains. The conclusion is mixed in the
 
sense that cleansing has helped to improve the recall on vital
 
documents and Wikipedia entities, but at the same time reduces the
 
recall on Twitter entities and the relative category of 
 
relevance ranking. Vital and relevant documents show a difference in
 
retrieval performance, where vital documents appear to be easier to filter than
 
relevant ones. Notice that in the context of the CCR task, the vital documents are
 
most important. 
 
relevant ones. (Notice that in the context of the CCR task, the vital documents are
 
most important.) The bottom line is that improving the filtering step
 
with respect to recall has shown that current entity oriented
 
retrieval approaches need to be improved to better classify and rank the ``new''
 
documents that make it into the working set.  
 
 
 
Despite an exhaustive attempt to identify as many vital-relevant
 
documents as possible,  we observe that there are still documents that
 
we miss. While some can clearly be retrieved by modifying the
 
filtering procedure, some relevant and even vital documents can be
 
considered irretrievable. The circumstances under
 
which this happens are many. A few documents have no content, or it is
 
unclear why they have been judged vital. However, the main
 
circumstances under which vital documents 
 
can defy filtering include: outgoing link mentions,
 
venue-event, entity - related entity, organization - main area of
 
operation, entity - group, artist - artist's work,  party-politician,
 
and world knowledge.
 
 
 
%ACKNOWLEDGMENTS are optional
 
%\section{Acknowledgments}
 
 
%
 
% The following two commands are all you need in the
 
% initial runs of your .tex file to
 
% produce the bibliography for the citations in your paper.
 
\bibliographystyle{abbrv}
0 comments (0 inline, 0 general)