Number of the records: 1  

Avoiding overfit by restricted model search in tree-based EEG classification

  1. 1.
    0390576 - ÚI 2013 RIV NL eng C - Conference Paper (international conference)
    Klaschka, Jan
    Avoiding overfit by restricted model search in tree-based EEG classification.
    Proceedings of the 58th World Statistics Congress 2011. The Hague: International Statistical Institute, 2012, s. 5077-5082. ISBN 978-90-73592-33-9.
    [ISI 2011. Session of the International Statistical Institute /58./. Dublin (IE), 21.08.2011-26.08.2011]
    R&D Projects: GA MŠMT ME 949
    Institutional research plan: CEZ:AV0Z10300504
    Keywords : model search * electroencephalography * classification trees and forests * random forests
    Subject RIV: BB - Applied Statistics, Operational Research
    http://2011.isiproceedings.org/papers/950644.pdf

    This work follows up previous studies where EEG frequency spectra of a group of experimental subjects were analyzed in order to find accurate enough classifiers discriminating somnolence (sleepiness) from other brain states. The classifiers considered were complex models whose building blocks were classification forests constructed by the Random Forests method. Since the EEG signals are highly individual, it is necessary to tailor a separate classifier for each subject. Earlier studies have shown, however, that a model (classification forest) based exclusively on the subject's own data may be improved, when combined (through a weighted average of votes for classes) with a model derived from the data of a well selected subset S of other subjects. Different strategies of the search for a proper set S were experimentally compared and a “winning” strategy chosen. The starting point of the present study was a strange and undesirable behavior of the best models from the previous research stage, observed when the size of the forests (number of trees) was varied (originally, the default of 500 trees per forest was used): Some of the models deteriorated with the growing forest size. Such phenomena often result from an overfit due to a too extensive model search, but the tendency to overfit is not, as a rule, a property of the components of the models, i.e. of the forests. That has given rise to the hypothesis that combining bigger forest is more prone to overfit than that of the smaller ones. If so, the model search should be the more restricted (i.e. the number of candidate models kept smaller), the bigger forests are combined. The hypothesis is supported by a computational experiment whose results are reported: After applying restrictions to the size of set S (and to the values of some numerical parameters of the models, too), the trend of model deterioration with the growing forest size vanished or, at least, was considerably attenuated.
    Permanent Link: http://hdl.handle.net/11104/0219441

     
    FileDownloadSizeCommentaryVersionAccess
    a0390576.pdf0167.1 KBPublisher’s postprintrequire
    0390576.pdf1688.5 KBAuthor´s preprintopen-access
     
Number of the records: 1  

  This site uses cookies to make them easier to browse. Learn more about how we use cookies.