Applied Sciences (Switzerland), cilt.16, sa.11, 2026 (SCI-Expanded, Scopus)
The rapid evolution of Android malware and increasingly sophisticated obfuscation techniques challenge traditional detection systems. This study presents a rigorous, unified comparative evaluation of three methodological paradigms-classical machine learning, Transformer-based architectures, and generative Large Language Models (LLMs)-for static Android malware detection. We construct a balanced dataset of 12,000 APKs from the AndroZoo repository and implement a fold-independent experimental pipeline featuring constraint-aware sequence selection for Transformers and structured LLM-driven feature distillation with parameter-efficient fine-tuning (LoRA). All evaluations employ stratified 5-fold cross-validation with statistical significance testing and comprehensive resource profiling. Classical models (e.g., Random Forest) achieve strong baselines (~0.975 F1) but exhibit limited contextual resilience. Distilled Transformers (RoBERTa ~0.970 F1-score) deliver an optimal accuracy-latency trade-off for real-time screening. While zero-shot LLMs show moderate performance (~0.74–0.84 F1), integrating LLM-extracted semantic features with LoRA fine-tuning yields accuracy (Qwen3.5-27B: ~0.982 F1-score), cross-dataset generalization, and structured interpretability. Hallucination analysis reveals a manageable 7.7% rate, with ablation confirming minimal impact on downstream classification. We advocate a tiered deployment strategy: lightweight Transformers for high-throughput screening, complemented by fine-tuned LLMs for deep forensic analysis and explainable threat intelligence. This hybrid framework effectively balances computational efficiency, detection robustness, and operational interpretability for modern Android security pipelines.