Number of the records: 1  

Interpreting and Clustering Outliers with Sapling Random Forests

  1. 1.
    0432410 - ÚI 2015 RIV CZ eng C - Conference Paper (international conference)
    Kopp, Martin - Pevný, T. - Holeňa, Martin
    Interpreting and Clustering Outliers with Sapling Random Forests.
    ITAT 2014. Information Technologies - Applications and Theory. Part II. Prague: Institute of Computer Science AS CR, 2014 - (Kůrková, V.; Bajer, L.; Peška, L.; Vojtáš, R.; Holeňa, M.; Nehéz, M.), s. 61-67. ISBN 978-80-87136-19-5.
    [ITAT 2014. European Conference on Information Technologies - Applications and Theory /14./. Demänovská dolina (SK), 25.09.2014-29.09.2014]
    R&D Projects: GA ČR GA13-17187S
    Grant - others:GA ČR(CZ) GPP103/12/P514
    Institutional support: RVO:67985807
    Keywords : anomaly detection * anomaly interpretation * clustering * decision trees * feature selection * random forest
    Subject RIV: IN - Informatics, Computer Science

    The main objective of outlier detection is finding samples considerably deviating from the majority. Such outliers, often referred to as anomalies, are nowadays more and more important, because they help to uncover interesting events within data. Consequently, a considerable amount of statistical and data mining techniques to identify anomalies was proposed in the last few years, but only a few works at least mentioned why some sample was labelled as an anomaly. Therefore, we propose a method based on specifically trained decision trees, called sapling random forest. Our method is able to interpret the output of arbitrary anomaly detector. The explanation is given as a subset of features, in which the sample is most deviating, or as conjunctions of atomic conditions, which can be viewed as antecedents of logical rules easily understandable by humans. To simplify the investigation of suspicious samples even more, we propose two methods of clustering anomalies into groups. Such clusters can be investigated at once saving time and human efforts. The feasibility of our approach is demonstrated on several synthetic and one real world datasets.
    Permanent Link: http://hdl.handle.net/11104/0236773

     
    FileDownloadSizeCommentaryVersionAccess
    0432410.pdf26156.4 KBPublisher’s postprintopen-access
     
Number of the records: 1  

  This site uses cookies to make them easier to browse. Learn more about how we use cookies.