Number of the records: 1
Random-Forest-Based Analysis of URL Paths
- 1.
SYSNO ASEP 0478626 Document Type C - Proceedings Paper (int. conf.) R&D Document Type Conference Paper Title Random-Forest-Based Analysis of URL Paths Author(s) Puchýř, J. (CZ)
Holeňa, Martin (UIVT-O) SAI, RIDSource Title Proceedings ITAT 2017: Information Technologies - Applications and Theory. - Aachen & Charleston : Technical University & CreateSpace Independent Publishing Platform, 2017 / Hlaváčová J. - ISSN 1613-0073 - ISBN 978-1974274741 Pages s. 129-135 Number of pages 7 s. Publication form Online - E Action ITAT 2017. Conference on Theory and Practice of Information Technologies - Applications and Theory /17./ Event date 22.09.2017 - 26.09.2017 VEvent location Martinské hole Country SK - Slovakia Event type EUR Language eng - English Country DE - Germany Keywords malicious URLs detection ; classification ; random forest Subject RIV IN - Informatics, Computer Science OECD category Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8) R&D Projects GA17-01251S GA ČR - Czech Science Foundation (CSF) Institutional support UIVT-O - RVO:67985807 EID SCOPUS 85045771719 Annotation One of the key sources of spreading malware are malicious web sites - either tricking user to install malware imitating legitimate software or, in the case of various exploit kits, initiating malware installation even without any user action. The most common technique against such web sites is blacklisting. However, it provides little to no information about new sites never seen before. Therefore, there has been important research into predicting malicious web sites based on their features. This work-in-progress paper presents a light-weight prediction method using solely lexical features of the site URL and classification by random forests. To this end, three possibilities of feature extraction have been elaborated and investigated on real-world data sets with respect to precision and recall. The obtained results indicate that there is nearly never a significant difference betweeen the considered methods, and that in spite of the limitation to the lexical features of the site URL, they have an impressive performance in terms of area under the precision-recall curve for the path parts of URLs. Workplace Institute of Computer Science Contact Tereza Šírová, sirova@cs.cas.cz, Tel.: 266 053 800 Year of Publishing 2018 Electronic address http://ceur-ws.org/Vol-1885/129.pdf
Number of the records: 1