HCDA/cikm-paper Changeset - a07620ff7ab1 · Centrum Wiskunde & Informatica (CWI)

Changeset - a07620ff7ab1

Parent rev.

Child rev.

[Not reviewed]

0 1 0

Arjen de Vries (arjen) - 11 years ago 2014-06-12 06:03:38
arjen.de.vries@cwi.nl

more in discussion

1 file changed with 25 insertions and 27 deletions:

mypaper-final.tex

0 comments (0 inline, 0 general)

mypaper-final.tex

➞

Show inline comments

 of the raw corpus brings in documents that can not be retrieved from
 the cleansed corpus. This is true for all entity profiles and for all
 entity types. The  recall difference between the cleansed and raw
-ranges from  6.8\% t 26.2\%. These increases, in actual
+ranges from  6.8\% to 26.2\%. These increases, in actual
 document-entity pairs,  is in thousands. We believe this is a
 substantial increase. However, the recall increases do not always
-translate to improved F-score in overall performance.  In the vital
+translate to improved max-F on the overall system performance.  In the vital
 relevance ranking for both Wikipedia and aggregate entities, the
 cleansed version performs better than the raw version.  In Twitter
 entities, the raw corpus achieves better except in the case of all
-name-variant, though the difference is negligible.  However, for
 name-variant, though the difference is negligible. However, for
 vital-relevant, the raw corpus performs  better across all entity
-profiles and entity types except in partial canonical names of
+profiles and entity types except for the case of partial canonical names of
 Wikipedia entities.
 The use of different profiles also shows a big difference in
 recall. While in Wikipedia the use of canonical
 partial achieves better than name-variant, there is a steady increase
 in recall from canonical to canonical partial, to name-variant, and
 to name-variant partial. This pattern is also observed across the
 document categories.  However, here too, the relationship between
 the gain in recall as we move from less richer profile to a more
 richer profile and overall performance as measured by F-score  is not
 linear.
 The use of different entity profiles can have a large effect on
 recall. While in the case of Wikipedia entities the use of canonical
 partial achieves better recall than using name-variants, there seems a
 steady increase in recall from canonical to canonical partial, to
 name-variant, and to name-variant partial, a pattern that is observed
 across the document categories.  However, here too, the relationship between
 the gain in recall as we move from less richer profile to a
 richer profile and the overall CCR performance as measured by max-F is
 not simply positive.
 %%%%%%%%%%%%
 In vital ranking, across all entity profiles and types of corpus,
 Wikipedia's canonical partial  achieves better performance than any
+Wikipedia's canonical partial representation achieves better performance than any
 other Wikipedia entity profiles. In vital-relevant documents too,
 Wikipedia's canonical partial achieves the best result. In the raw
 corpus, it achieves a little less than name-variant partial. For
 highest F-score across all entity profiles and types of corpus.
 There are 3 interesting observations:
 ) cleansing impacts Twitter
 entities and relevant documents.  This  is validated by the
 observation that recall  gains in Twitter entities and the relevant
 Cleansing impacts Twitter
 entities and relevant documents.  This is validated by the
 observation that recall gains in Twitter entities and the relevant
 categories in the raw corpus also translate into overall performance
 gains. This observation implies that cleansing removes relevant and
 social documents than it does vital and news. That it removes relevant
 documents more than vital can be explained by the fact that cleansing
 gains. This observation implies that cleansing removes more relevant and
 social documents than it does vital and news, which may be
 explained by the fact that cleansing
 removes the related links and adverts which may contain a mention of
 the entities. One example we saw was the the cleansing removed an
 image with a text of an entity name which was actually relevant. And
 that it removes social documents can be explained by the fact that
 most of the missing of the missing  docuemnts from cleansed are
 social. And all the docuemnts that are missing from raw corpus
 social. So in both cases social seem to suffer from text
 image with a text of an entity name which was actually relevant. The
 removal of predominantly social documents can be explained by the fact that
 all of the missing documents from the raw corpus and the majority of
 the missing documents from the cleansed corpus belong to
 the social category. In both cases, especially the social channel seems to suffer from text
 transformation and cleasing processes.
 %%%% NEEDS WORK:

0 comments (0 inline, 0 general)