基于演员-评论家强化学习框架的多无人机辅助通信轨迹规划方法

陈泽超; 郭一珺

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
动态公开评议
相关论文
评论

Trajectory Design for Multi-UAV Aided Communication with Actor-critic-based Reinforcement Learning

首发时间：2021-03-08

CHEN Ze-Chao ¹
Chen Zechao（1996-），female，major research direction：UAV communication.
GUO Yi-Jun ¹
Guo Yijun（1989-），female，associate professor，doctoral supervisor，major research direction： UAV communication and artiflcial intelligence.E-mail:guoyijun@bupt.edu.cn

1、School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876

Abstract：In this paper, the trajectory design problem is investigated in wireless communications aided by multiple unmanned aerial vehicles (UAVs), and a multi-UAV trajectory design method called multi-agent twin delayed deep deterministic policy gradient (MA-TD3) is proposed which is able to design continuous trajectories without pre-knowledge of global information such as user locations and channel conditions, through integrating the multi-agent deep deterministic policy gradient (MADDPG) algorithm and twin delayed deep deterministic policy gradient (TD3) algorithm based on actor-critic reinforcement learning (RL) framework. In particular, the multi-UAV trajectory design problem is firstly formulated as a stochastic game (SG) to maximize the completion rate of the transmission tasks. Then, the MA-TD3 method is proposed which is based on the actor-critic RL framework and the learned trajectory is obtained successively. Numerical results show that compared to traditional single agent RL methods, the proposed MA-TD3 method achieves higher completion rate of the transmission tasks by enabling cooperation between multiple UAVs through centralized training and distributed execution.

keywords： Communication and Information System trajectory design multi-UAV aided communication multi-agent reinforcement learning

点击查看论文中文信息

基于演员-评论家强化学习框架的多无人机辅助通信轨迹规划方法

陈泽超 ¹
Chen Zechao（1996-），female，major research direction：UAV communication.
郭一珺 ¹
Guo Yijun（1989-），female，associate professor，doctoral supervisor，major research direction： UAV communication and artiflcial intelligence.E-mail:guoyijun@bupt.edu.cn

1、北京邮电大学信息与通信工程学院，北京　100876

摘要：本文研究了多无人机辅助无线通信中的轨迹设计问题,提出了一种基于强化学习（Reinforcement Learning, RL）的多无人机轨迹设计方法,该方法能够在不预先了解用户位置和信道条件等全局信息的情况下为多无人机辅助通信设计连续动作空间的轨迹。结合了多智能体深度确定性策略梯度算法（Multi-agent Deep Deterministic Policy Gradient, MADDPG）和双延迟深度确定性策略梯度算法（Twin Delayed Deep Deterministic Policy Gradient, TD3）,提出基于演员-评论家（Actor-critic）强化学习框架的多无人机轨迹规划算法——多智能体双延迟深度确定性策略梯度（Multi-agent Twin Delayed Deep Deterministic Policy Gradient, MA-TD3）算法。本文首先将多无人机轨迹设计问题规划为一个多智能体随机博弈(SG)过程,以在无法获取位置信息、用户发射功率和信道参数等信息的情况下,最大化传输任务的完成率。在此基础上提出了基于演员-评论家强化学习框架的MA-TD3方法来学习得到轨迹。仿真结果表明,与传统的单智能体强化学习方法相比,所提出的MA-TD3方法通过集中训练和分布式执行,实现了多架无人机之间的协作,实现了更高的传输任务完成率。

关键词：通信与信息系统轨迹规划多无人机辅助通信多智能体强化学习

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

CHEN Ze-Chao,GUO Yi-Jun. Trajectory Design for Multi-UAV Aided Communication with Actor-critic-based Reinforcement Learning[EB/OL]. Beijing:Sciencepaper Online[2021-03-08]. https://www.paper.edu.cn/releasepaper/content/202103-88.

No.****

动态公开评议

共计0人参与

动态评论进行中

全部评论

0/1000

论文编号	202103-88
论文题目	基于演员-评论家强化学习框架的多无人机辅助通信轨迹规划方法
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.