磁导航AGV深度强化学习路径跟踪控制方法

英文篇名：Deep Reinforcement Learning Based Path Following Control of Magnetic Navigation AGV
作者：任彧 ; 赵师涛
英文作者：REN Yu;ZHAO Shitao;School of Computer, Hangzhou Dianzi University;
关键词：自动导引车 ; 路径跟踪 ; 强化学习 ; 深度强化学习
英文关键词：automatic guided vehicle;;path following;;reinforcement learning;;deep reinforcement learning
中文刊名：HXDY
英文刊名：Journal of Hangzhou Dianzi University(Natural Sciences)
机构：杭州电子科技大学计算机学院;
出版日期：2019-03-15
出版单位：杭州电子科技大学学报(自然科学版)
年：2019
期：v.39;No.178
语种：中文;
页：HXDY201902006
页数：7
CN：02
ISSN：33-1339/TN
分类号：31-37

摘要

在系统参数未知的情况下,磁导航AGV路径跟踪控制器的设计中,为了免除大量的人工参数整定工作,使用了改进的免模型深度强化学习算法设计了路径跟踪控制器。在建立磁导航AGV离散时域的运动学和动力学模型的基础上,将路径跟踪问题建模为连续状态与动作空间的马尔可夫决策过程。使用2个深度神经网络用于实现一个演员评判家(Actor-Critic)架构的异策略(Off-Policy)免模型强化学习算法作为控制器的实现,分别采用时域差分法与确定性策略梯度算法用于对动作值函数与策略函数的参数更新。仿真结果表明:设计的控制器可自主从经验中学习到表现优异的控制规则。
In order to design a path following controller for automatic guided vehicle(AGV) to track a reference path without system parameters and manual parameter setting. Based on the kinematic and dynamics model derived in the discrete-time domain, this paper formulate the path following control problem of magnetic navigation AGV as continuous-state, continuous-action Markov decision processes and design two network to applying an actor-critic, model free reinforcement learning algorithm to implement the controller. Based on theories and simulations, the proposed controller can learned controller law from experience.

引文

[1] CHEN D, SHI Z, YUAN P, et al. Trajectory tracking control method and experiment of AGV[C]//Advanced Motion Control (AMC), 2016 IEEE 14th International Workshop on. IEEE, 2016: 24-29.
    [2] NORMEY-RICO J E, ALCALá I, GóMEZ-ORTEGA J, et al. Mobile robot path tracking using a robust PID controller[J]. Control Engineering Practice, 2001, 9(11): 1209-1214.
    [3] LI X, LUO C, XU Y, et al. A Fuzzy PID controller applied in AGV control system[C]//Advanced Robotics and Mechatronics (ICARM), International Conference on. IEEE, 2016: 555-560.
    [4] JUANG C F, CHANG Y C. Evolutionary-group-based particle-swarm-optimized fuzzy controller with application to mobile-robot navigation in unknown environments[J]. IEEE Transactions on Fuzzy Systems, 2011, 19(2): 379-392.
    [5] LEE Y H, JIN G G, SO M O. Level control of single water tank systems using Fuzzy-PID technique[J]. Journal of the Korean Society of Marine Engineering, 2014, 38(5): 550-556.
    [6] KANAYAMA Y, KIMURA Y, MIYAZAKI F, et al. A stable tracking control method for an autonomous mobile robot[C]//Robotics and Automation, 1990. Proceedings, 1990 IEEE International Conference on. IEEE, 1990: 384-389.
    [7] HUNG N, IM J S, JEONG S K, et al. Design of a sliding mode controller for an automatic guided vehicle and its implementation[J]. International Journal of Control, Automation and Systems, 2010, 8(1): 81-90.
    [8] WU W, CHEN H, WANG Y, et al. Adaptive exponential stabilization of mobile robots with uncertainties[C]//Decision and Control, 1999. Proceedings of the 38th IEEE Conference on. IEEE, 1999, 4: 3484-3489.
    [9] BUI T L, DOAN P T, VAN D T, et al. Hybrid control of a tricycle wheeled AGV for path following using advanced fuzzy-PID[J]. Journal of The Korean Society of Marine Engineering, 2014,38(10):1287-1296.
    [10] CAMPION G, BASTIN G, DANDREA-NOVEL B. Structural properties and classification of kinematic and dynamic models of wheeled mobile robots[J]. IEEE Transactions on Robotics and Automation, 1996, 12(1): 47-62.
    [11] WAWZYNSKI P. Control policy with autocorrelated noise in reinforcement learning for robotics[J]. International Journal of Machine Learning and Computing, 2015, 5(2): 91-95.
    [12] SAMSON L, INTELLIGENTIE B O K, GAVVES E. Deep reinforcement learning applied to the game bubble shooter[D]. Amsterdam:University of Amsterdam, 2016.
    [13] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning, 2014:387-395.
    [14] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//International Conference on Learning Representations, 2016:1-14.