Drones and Autonomous Vehicles, cilt.3, sa.2, ss.1-19, 2026 (Hakemli Dergi)
The
operational utility of Unmanned Aerial Vehicles (UAVs) has evolved from passive
surveillance to active engagement in disputed environments, where autonomous
control must operate under highly dynamic and adversarial conditions.
Hand-crafted heuristics often exhibit limited robustness when facing stochastic
opponent behavior and non-stationary interactions. To address these challenges,
we propose a Multi-Agent Deep Reinforcement Learning (MADRL) framework
implemented in a Unity 6–based, physics-driven simulation that models flight
dynamics and weapon kinematics. Agents are trained using Proximal Policy
Optimization (PPO) with a composite reward function designed to encourage
cooperative behaviors (e.g., coordinated target engagement) while enforcing
safety constraints such as collision avoidance. In empirical evaluations, the
learned policies achieve an 85% win rate against a heuristic baseline under the
tested scenarios, exhibiting coordinated maneuvers and adaptive engagement
strategies. These results indicate that multi-agent learning with decentralized
execution can reduce operator workload and improve swarm effectiveness and
survivability in conflict zone.