Avoiding overfit by restricted model search in tree-based  EEG classification

Klaschka, Jan

Number of the records: 1

Avoiding overfit by restricted model search in tree-based EEG classification

To the basket
RIV
OpenAIRE
Bookmark
1.
0390576 - ÚI 2013 RIV NL eng C - Conference Paper (international conference)
Klaschka, Jan
Avoiding overfit by restricted model search in tree-based EEG classification.
Proceedings of the 58th World Statistics Congress 2011. The Hague: International Statistical Institute, 2012, s. 5077-5082. ISBN 978-90-73592-33-9.
[ISI 2011. Session of the International Statistical Institute /58./. Dublin (IE), 21.08.2011-26.08.2011]
R&D Projects: GA MŠMT ME 949
Institutional research plan: CEZ:AV0Z10300504
Keywords : model search * electroencephalography * classification trees and forests * random forests
Subject RIV: BB - Applied Statistics, Operational Research
http://2011.isiproceedings.org/papers/950644.pdf

This work follows up previous studies where EEG frequency spectra of a group of experimental subjects were analyzed in order to find accurate enough classifiers discriminating somnolence (sleepiness) from other brain states. The classifiers considered were complex models whose building blocks were classification forests constructed by the Random Forests method. Since the EEG signals are highly individual, it is necessary to tailor a separate classifier for each subject. Earlier studies have shown, however, that a model (classification forest) based exclusively on the subject's own data may be improved, when combined (through a weighted average of votes for classes) with a model derived from the data of a well selected subset S of other subjects. Different strategies of the search for a proper set S were experimentally compared and a “winning” strategy chosen. The starting point of the present study was a strange and undesirable behavior of the best models from the previous research stage, observed when the size of the forests (number of trees) was varied (originally, the default of 500 trees per forest was used): Some of the models deteriorated with the growing forest size. Such phenomena often result from an overfit due to a too extensive model search, but the tendency to overfit is not, as a rule, a property of the components of the models, i.e. of the forests. That has given rise to the hypothesis that combining bigger forest is more prone to overfit than that of the smaller ones. If so, the model search should be the more restricted (i.e. the number of candidate models kept smaller), the bigger forests are combined. The hypothesis is supported by a computational experiment whose results are reported: After applying restrictions to the size of set S (and to the values of some numerical parameters of the models, too), the trend of model deterioration with the growing forest size vanished or, at least, was considerably attenuated.
Permanent Link: http://hdl.handle.net/11104/0219441
File Download Size Commentary Version Access

a0390576.pdf 0 167.1 KB Publisher’s postprint require

0390576.pdf 1 688.5 KB Author´s preprint open-access

Number of the records: 1

	File	Download	Size	Commentary	Version	Access
	a0390576.pdf	0	167.1 KB		Publisher’s postprint	require
	0390576.pdf	1	688.5 KB		Author´s preprint	open-access