Number of the records: 1  

Random-Forest-Based Analysis of URL Paths

  1. 1.
    SYSNO ASEP0478626
    Document TypeC - Proceedings Paper (int. conf.)
    R&D Document TypeConference Paper
    TitleRandom-Forest-Based Analysis of URL Paths
    Author(s) Puchýř, J. (CZ)
    Holeňa, Martin (UIVT-O) SAI, RID
    Source TitleProceedings ITAT 2017: Information Technologies - Applications and Theory. - Aachen & Charleston : Technical University & CreateSpace Independent Publishing Platform, 2017 / Hlaváčová J. - ISSN 1613-0073 - ISBN 978-1974274741
    Pagess. 129-135
    Number of pages7 s.
    Publication formOnline - E
    ActionITAT 2017. Conference on Theory and Practice of Information Technologies - Applications and Theory /17./
    Event date22.09.2017 - 26.09.2017
    VEvent locationMartinské hole
    CountrySK - Slovakia
    Event typeEUR
    Languageeng - English
    CountryDE - Germany
    Keywordsmalicious URLs detection ; classification ; random forest
    Subject RIVIN - Informatics, Computer Science
    OECD categoryComputer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
    R&D ProjectsGA17-01251S GA ČR - Czech Science Foundation (CSF)
    Institutional supportUIVT-O - RVO:67985807
    EID SCOPUS85045771719
    AnnotationOne of the key sources of spreading malware are malicious web sites - either tricking user to install malware imitating legitimate software or, in the case of various exploit kits, initiating malware installation even without any user action. The most common technique against such web sites is blacklisting. However, it provides little to no information about new sites never seen before. Therefore, there has been important research into predicting malicious web sites based on their features. This work-in-progress paper presents a light-weight prediction method using solely lexical features of the site URL and classification by random forests. To this end, three possibilities of feature extraction have been elaborated and investigated on real-world data sets with respect to precision and recall. The obtained results indicate that there is nearly never a significant difference betweeen the considered methods, and that in spite of the limitation to the lexical features of the site URL, they have an impressive performance in terms of area under the precision-recall curve for the path parts of URLs.
    WorkplaceInstitute of Computer Science
    ContactTereza Šírová, sirova@cs.cas.cz, Tel.: 266 053 800
    Year of Publishing2018
    Electronic addresshttp://ceur-ws.org/Vol-1885/129.pdf
Number of the records: 1  

  This site uses cookies to make them easier to browse. Learn more about how we use cookies.