Number of the records: 1

A Sparse Pair-preserving Centroid-based Supervised Learning Method for High-dimensional Biomedical Data or Images

SYSNO ASEP	0524330
Document Type	J - Journal Article
R&D Document Type	Journal Article
Subsidiary J	Článek ve WOS
Title	A Sparse Pair-preserving Centroid-based Supervised Learning Method for High-dimensional Biomedical Data or Images
Author(s)	Kalina, Jan (UIVT-O)_{RID, SAI, ORCID} Matonoha, Ctirad (UIVT-O)_{RID, SAI}
Number of authors	2
Source Title	Biocybernetics and Biomedical Engineering. - : Elsevier - ISSN 0208-5216 Roč. 40, č. 2 (2020), s. 774-786
Number of pages	13 s.
Publication form	Print - P
Language	eng - English
Country	PL - Poland
Keywords	supervised learning ; high-dimensional data ; robustness ; sparsity ; nonlinear optimization
Subject RIV	IN - Informatics, Computer Science
OECD category	Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
R&D Projects	GA19-05704S GA ČR - Czech Science Foundation (CSF)
Method of publishing	Limited access
Institutional support	UIVT-O - RVO:67985807
UT WOS	000547542400014
EID SCOPUS	85084491501
DOI	10.1016/j.bbe.2020.03.008
Annotation	In various biomedical applications designed to compare two groups (e.g. patients and controls in matched case-control studies), it is often desirable to perform a dimensionality reduction in order to learn a classification rule over high-dimensional data. This paper considers a centroid-based classification method for paired data, which at the same time performs a supervised variable selection respecting the matched pairs design. We propose an algorithm for optimizing the centroid (prototype, template). A subsequent optimization of weights for the centroid ensures sparsity, robustness to outliers, and clear interpretation of the contribution of individual variables to the classification task. We apply the method to a simulated matched case-control study dataset, to a gene expression study of acute myocardial infarction, and to mouth localization in 2D facial images. The novel approach yields a comparable performance with standard classifiers and outperforms them if the data are contaminated by outliers. This robustness makes the method relevant for genomic, metabolomic or proteomic high-dimensional data (in matched case-control studies) or medical diagnostics based on images, as (excessive) noise and contamination are ubiquitous in biomedical measurements.
Workplace	Institute of Computer Science
Contact	Tereza Šírová, sirova@cs.cas.cz, Tel.: 266 053 800
Year of Publishing	2021
Electronic address	http://dx.doi.org/10.1016/j.bbe.2020.03.008

Number of the records: 1