Abstract
Performance prediction of machine learning models can speed up automated machine learning procedures and it can be also incorporated into model recommendation algorithms. We propose a meta-learning framework that utilizes information about previous runs of machine learning workflows on benchmark tasks. We extract features describing the workflows and meta-data about tasks, and combine them to train a regressor for performance prediction. This way, we obtain the model performance prediction without any training, just by means of feature extraction and inference via the regressor. The approach is tested on OpenML-CC18 Curated Classification benchmark estimating the 75th percentile value of area under the ROC curve (AUC) of the classifiers. We were able to obtain consistent predictions with \(R^2\) score of 0.8 for previously unseen data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bischl B, Casalicchio G, Feurer M, Hutter F, Lang M, Mantovani RG, van Rijn JN, Vanschoren J (2019) OpenML benchmarking suites. arXiv:1708.03731v2 [stat.ML]
Brazdil P, van Rijn JN, Soares C, Vanschoren J (2022) Metalearning: applications to automated machine learning and data mining, 2nd edn. Springer, Cham
Flach P (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, New York
Goodfellow IJ, Bengio Y, Courville AC (2016) Deep learning. Adaptive computation and machine learning. MIT Press, Cambridge. http://www.deeplearningbook.org/
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer Series in statistics. Springer, Cham. http://www.worldcat.org/oclc/300478243
Hutter F, Hoos H, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning. Proceedings of machine learning research, vol 32. PMLR, Beijing, pp 754–762. https://proceedings.mlr.press/v32/hutter14.html
Hutter F, Kotthoff L, Vanschoren J (eds) (2019) Automated machine learning - methods, systems, challenges. Springer, Cham
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) LightGBM: a highly efficient gradient boosting decision tree. In: NIPS
Lemke C, Budka M, Gabrys B (2013) Metalearning: a survey of trends and technologies. Artif Intell Rev 44:117–130
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, NIPS 2013, vol 2. Curran Associates Inc., Red Hook, pp 3111–3119
Mueller AC, Guido S (2016) Introduction to machine learning with python: a guide for data scientists. O’Reilly Media, Inc.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Post MJ, van der Putten P, van Rijn JN (2016) Does feature selection improve classification? A large scale experiment in OpenML. In: IDA
van Rijn J, Hutter F (2018) Hyperparameter importance across datasets, pp 2367–2376
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag. 24(5):513–523. https://www.sciencedirect.com/science/article/pii/0306457388900210
Strang B, van der Putten P, van Rijn JN, Hutter F (2018) Don’t rule out simple models prematurely: a large scale benchmark comparing linear and non-linear classifiers in OpenML. In: IDA
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013) OpenML: networked science in machine learning. SIGKDD Explor 15(2):49–60. https://doi.org/10.1145/2641190.264119
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Neruda, R., Figueroa-García, J.C. (2023). Feature Selection for Performance Estimation of Machine Learning Workflows. In: Rocha, Á., Ferrás, C., Ibarra, W. (eds) Information Technology and Systems. ICITS 2023. Lecture Notes in Networks and Systems, vol 691. Springer, Cham. https://doi.org/10.1007/978-3-031-33258-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-33258-6_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33257-9
Online ISBN: 978-3-031-33258-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)