Number of the records: 1  

Comparison of Selected Methods for Document Clustering

  1. 1.
    0356107 - ÚI 2011 RIV DE eng C - Conference Paper (international conference)
    Ševčík, R. - Řezanková, H. - Húsek, Dušan
    Comparison of Selected Methods for Document Clustering.
    Advances in Intelligent Web Mastering - 3. Berlin: Springer, 2011 - (Mugellini, E.; Szczepaniak, P.; Pettenati, M.; Sokhn, M.), s. 101-110. Advances in Intelligent and Soft Computing, 86. ISBN 978-3-642-18028-6. ISSN 1867-5662.
    [AWIC 2011. Atlantic Web Intelligence Conference /7./. Fribourg (CH), 26.01.2011-28.01.2011]
    R&D Projects: GA ČR GAP202/10/0262; GA ČR GA205/09/1079
    Institutional research plan: CEZ:AV0Z10300504
    Keywords : web clustering * cluster analysis * textual documents * web content classification * newsgroups analysis * vector model
    Subject RIV: IN - Informatics, Computer Science

    17 cluster analysis techniques proposed for document clustering in terms of internal and external quality measures of clustering and computing time demands are compared. These are combinations of three basic methods (direct, repeated bisection and agglomerative) and five clustering criterion functions for solution assessment (two intra-cluster, one inter-cluster, and two complex ones); all implemented in the CLUTO software package. Furthermore, in the case of the agglomerative method we also applied a single linkage and complete linkage clustering as a criterion function. Collection 20 Newsgroups, a binary vector representation of e-mail messages, was used for comparing the methods. Experiments with document clustering have proved that, from the point of view of entropy and purity, the direct method provides the best results. As regards computing time, the repeated bisection (divisive) method has been the fastest.
    Permanent Link: http://hdl.handle.net/11104/0194720

     
     
Number of the records: 1  

  This site uses cookies to make them easier to browse. Learn more about how we use cookies.