Data processing pipeline for cardiogenic shock prediction using machine learning

Jajcay, Nikola; Bezák, B.; Segev, A.; Matetzky, S.; Janková, J.; Spartalis, M.; El Tahlawi, M.; Guerra, F.; Friebel, J.; Thevathasan, T.; Berta, I.; Pölzl, L.; Nägele, F.; Pogran, E.; Cader, F. A.; Jarakovic, M.; Gollmann-Tepeköylü, C.; Kollárová, M.; Petríková, K.; Tica, O.; Krychtiuk, K. A.; Tavazzi, G.; Skurk, C.; Huber, K.; Böhm, A.

doi:https://dx.doi.org/10.3389/fcvm.2023.1132680

Number of the records: 1

Data processing pipeline for cardiogenic shock prediction using machine learning

To the basket
RIV
DOI
WOS
SCOPUS
PUBMED
Bookmark
1.
0571168 - ÚI 2024 RIV CH eng J - Journal Article
Jajcay, Nikola - Bezák, B. - Segev, A. - Matetzky, S. - Janková, J. - Spartalis, M. - El Tahlawi, M. - Guerra, F. - Friebel, J. - Thevathasan, T. - Berta, I. - Pölzl, L. - Nägele, F. - Pogran, E. - Cader, F. A. - Jarakovic, M. - Gollmann-Tepeköylü, C. - Kollárová, M. - Petríková, K. - Tica, O. - Krychtiuk, K. A. - Tavazzi, G. - Skurk, C. - Huber, K. - Böhm, A.
Data processing pipeline for cardiogenic shock prediction using machine learning.
Frontiers in Cardiovascular Medicine. Roč. 10, 23 March 2023 (2023), č. článku 1132680. E-ISSN 2297-055X
Institutional support: RVO:67985807
Keywords : classification * machine learning * missing data imputation * processing pipeline * prediction model * cardiogenic shock
OECD category: Cardiac and Cardiovascular systems
Impact factor: 3.6, year: 2022
Method of publishing: Open access
https://dx.doi.org/10.3389/fcvm.2023.1132680

INTRODUCTION: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. METHODS: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)—based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. RESULTS: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. CONCLUSION: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
Permanent Link: https://hdl.handle.net/11104/0342448

Number of the records: 1