Authors
Junhyeong Lee, Hyun Kwon
Published in
Scientific reports. Jul 03, 2026. Epub Jul 03, 2026.
Abstract
In this paper, we proposed a novel multimodal framework that systematically integrates visual, structural, and semantic representations for enhanced URL classification accuracy. Specifically, URLs are first transformed into grayscale images and processed using a convolutional neural network (CNN) based on the ResNet architecture, effectively capturing hidden visual patterns indicative of malicious intent. Concurrently, robust structural features extracted from URLs, such as length, domain specifics, and character distributions, are analyzed using advanced tree-based classifiers, namely XGBoost and CatBoost. Additionally, semantic insights are obtained by embedding raw URL strings into meaningful contextual vectors using a fine-tuned DistilBERT model, leveraging transformer-based deep learning techniques to capture nuanced textual semantics. The resulting outputs from these three complementary modalities-visual, structural, and semantic-are concatenated and further integrated through a fully connected deep neural network (DNN) to perform the final classification. Comprehensive experiments conducted on a large-scale dataset comprising millions of URLs demonstrate that the proposed multimodal fusion framework significantly surpasses traditional single-modal and existing ensemble baselines, achieving a superior performance with an 0.9189 F1-score and an 0.9805 AUC.
PMID:
42393077
Bibliographic data and abstract were imported from PubMed on 03 Jul 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 3
- Comments 0