Changeset - d9b84600c510
[Not reviewed]
0 1 0
Arjen de Vries (arjen) - 11 years ago 2014-06-12 05:48:34
arjen.de.vries@cwi.nl
conclusions "done"
1 file changed with 7 insertions and 4 deletions:
0 comments (0 inline, 0 general)
mypaper-final.tex
Show inline comments
 
@@ -1122,33 +1122,36 @@ construct entity profiles, and the effects of entity type (Wikipedia
 
or Twitter) and document category (news, social, or other). We attempted to address
 
the following research questions: 1) does cleansing affect filtering
 
and subsequent performance? 2) what is the most effective way of
 
entity profiling? 3) is filtering different for Wikipedia and Twitter
 
entities? 4) are some type of documents easily filterable and others
 
not? 5) does a gain in recall at filtering step translate to a gain in
 
max-F at the end of the pipeline? and 6) what are the
 
circumstances under which vital documents can not be retrieved?
 
 
Cleansing may remove (parts of) the contents of documents, making
 
them irretrievable. However, because of the introduction of false
 
positives, gaining recall by filtering the raw corpus instead of the
 
cleansed one and developing richer entity profiles, does not necessarily translate to overall
 
performance gains. The overall conclusion on this is mixed in the
 
cleansed one, as well as developing richer entity profiles, does not necessarily translate to overall
 
performance gains. The conclusion is mixed in the
 
sense that cleansing has helped to improve the recall on vital
 
documents and Wikipedia entities, but at the same time reduces the
 
recall on Twitter entities and the relative category of 
 
relevance ranking. Vital and relevant documents show a difference in
 
retrieval performance, where vital documents appear to be easier to filter than
 
relevant ones. Notice that in the context of the CCR task, the vital documents are
 
most important. 
 
relevant ones. (Notice that in the context of the CCR task, the vital documents are
 
most important.) The bottom line is that improving the filtering step
 
with respect to recall has shown that current entity oriented
 
retrieval approaches need to be improved to better classify and rank the ``new''
 
documents that make it into the working set.  
 
 
 
Despite an exhaustive attempt to identify as many vital-relevant
 
documents as possible,  we observe that there are still documents that
 
we miss. While some can clearly be retrieved by modifying the
 
filtering procedure, some relevant and even vital documents can be
 
considered irretrievable. The circumstances under
 
which this happens are many. A few documents have no content, or it is
 
unclear why they have been judged vital. However, the main
 
circumstances under which vital documents 
 
can defy filtering include: outgoing link mentions,
 
venue-event, entity - related entity, organization - main area of
0 comments (0 inline, 0 general)