Trajectory Design for Multi-UAV Aided Communication with Actor-critic-based Reinforcement Learning
首发时间:2021-03-08
Abstract:In this paper, the trajectory design problem is investigated in wireless communications aided by multiple unmanned aerial vehicles (UAVs), and a multi-UAV trajectory design method called multi-agent twin delayed deep deterministic policy gradient (MA-TD3) is proposed which is able to design continuous trajectories without pre-knowledge of global information such as user locations and channel conditions, through integrating the multi-agent deep deterministic policy gradient (MADDPG) algorithm and twin delayed deep deterministic policy gradient (TD3) algorithm based on actor-critic reinforcement learning (RL) framework. In particular, the multi-UAV trajectory design problem is firstly formulated as a stochastic game (SG) to maximize the completion rate of the transmission tasks. Then, the MA-TD3 method is proposed which is based on the actor-critic RL framework and the learned trajectory is obtained successively. Numerical results show that compared to traditional single agent RL methods, the proposed MA-TD3 method achieves higher completion rate of the transmission tasks by enabling cooperation between multiple UAVs through centralized training and distributed execution.
keywords: Communication and Information System trajectory design multi-UAV aided communication multi-agent reinforcement learning
点击查看论文中文信息
基于演员-评论家强化学习框架的多无人机辅助通信轨迹规划方法
摘要:本文研究了多无人机辅助无线通信中的轨迹设计问题,提出了一种基于强化学习(Reinforcement Learning, RL)的多无人机轨迹设计方法,该方法能够在不预先了解用户位置和信道条件等全局信息的情况下为多无人机辅助通信设计连续动作空间的轨迹。结合了多智能体深度确定性策略梯度算法(Multi-agent Deep Deterministic Policy Gradient, MADDPG)和双延迟深度确定性策略梯度算法(Twin Delayed Deep Deterministic Policy Gradient, TD3),提出基于演员-评论家(Actor-critic)强化学习框架的多无人机轨迹规划算法——多智能体双延迟深度确定性策略梯度(Multi-agent Twin Delayed Deep Deterministic Policy Gradient, MA-TD3)算法。本文首先将多无人机轨迹设计问题规划为一个多智能体随机博弈(SG)过程,以在无法获取位置信息、用户发射功率和信道参数等信息的情况下,最大化传输任务的完成率。在此基础上提出了基于演员-评论家强化学习框架的MA-TD3方法来学习得到轨迹。仿真结果表明,与传统的单智能体强化学习方法相比,所提出的MA-TD3方法通过集中训练和分布式执行,实现了多架无人机之间的协作,实现了更高的传输任务完成率。
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
基于演员-评论家强化学习框架的多无人机辅助通信轨迹规划方法
评论
全部评论0/1000