Dogfight Simulation of Autonomous Swarm UAVs Based on Multi-Agent Deep Reinforcement Learning


Creative Commons License

Cömertler H. Ö. F., Bora E., Çetin A.

Drones and Autonomous Vehicles, cilt.3, sa.2, ss.1-19, 2026 (Hakemli Dergi)

Özet

The operational utility of Unmanned Aerial Vehicles (UAVs) has evolved from passive surveillance to active engagement in disputed environments, where autonomous control must operate under highly dynamic and adversarial conditions. Hand-crafted heuristics often exhibit limited robustness when facing stochastic opponent behavior and non-stationary interactions. To address these challenges, we propose a Multi-Agent Deep Reinforcement Learning (MADRL) framework implemented in a Unity 6–based, physics-driven simulation that models flight dynamics and weapon kinematics. Agents are trained using Proximal Policy Optimization (PPO) with a composite reward function designed to encourage cooperative behaviors (e.g., coordinated target engagement) while enforcing safety constraints such as collision avoidance. In empirical evaluations, the learned policies achieve an 85% win rate against a heuristic baseline under the tested scenarios, exhibiting coordinated maneuvers and adaptive engagement strategies. These results indicate that multi-agent learning with decentralized execution can reduce operator workload and improve swarm effectiveness and survivability in conflict zone.