Počet záznamů: 1

Comparison of Selected Methods for Document Clustering

  1. 1.
    0356107 - UIVT-O 2011 RIV DE eng C - Konferenční příspěvek (zahraniční konf.)
    Ševčík, R. - Řezanková, H. - Húsek, Dušan
    Comparison of Selected Methods for Document Clustering.
    Advances in Intelligent Web Mastering - 3. Berlin: Springer, 2011 - (Mugellini, E.; Szczepaniak, P.; Pettenati, M.; Sokhn, M.), s. 101-110. Advances in Intelligent and Soft Computing, 86. ISBN 978-3-642-18028-6. ISSN 1867-5662.
    [AWIC 2011. Atlantic Web Intelligence Conference /7./. Fribourg (CH), 26.01.2011-28.01.2011]
    Grant CEP: GA ČR GAP202/10/0262; GA ČR GA205/09/1079
    Výzkumný záměr: CEZ:AV0Z10300504
    Klíčová slova: web clustering * cluster analysis * textual documents * web content classification * newsgroups analysis * vector model
    Kód oboru RIV: IN - Informatika

    17 cluster analysis techniques proposed for document clustering in terms of internal and external quality measures of clustering and computing time demands are compared. These are combinations of three basic methods (direct, repeated bisection and agglomerative) and five clustering criterion functions for solution assessment (two intra-cluster, one inter-cluster, and two complex ones); all implemented in the CLUTO software package. Furthermore, in the case of the agglomerative method we also applied a single linkage and complete linkage clustering as a criterion function. Collection 20 Newsgroups, a binary vector representation of e-mail messages, was used for comparing the methods. Experiments with document clustering have proved that, from the point of view of entropy and purity, the direct method provides the best results. As regards computing time, the repeated bisection (divisive) method has been the fastest.
    Trvalý link: http://hdl.handle.net/11104/0194720