摘要
在系统参数未知的情况下,磁导航AGV路径跟踪控制器的设计中,为了免除大量的人工参数整定工作,使用了改进的免模型深度强化学习算法设计了路径跟踪控制器。在建立磁导航AGV离散时域的运动学和动力学模型的基础上,将路径跟踪问题建模为连续状态与动作空间的马尔可夫决策过程。使用2个深度神经网络用于实现一个演员评判家(Actor-Critic)架构的异策略(Off-Policy)免模型强化学习算法作为控制器的实现,分别采用时域差分法与确定性策略梯度算法用于对动作值函数与策略函数的参数更新。仿真结果表明:设计的控制器可自主从经验中学习到表现优异的控制规则。
In order to design a path following controller for automatic guided vehicle(AGV) to track a reference path without system parameters and manual parameter setting. Based on the kinematic and dynamics model derived in the discrete-time domain, this paper formulate the path following control problem of magnetic navigation AGV as continuous-state, continuous-action Markov decision processes and design two network to applying an actor-critic, model free reinforcement learning algorithm to implement the controller. Based on theories and simulations, the proposed controller can learned controller law from experience.
引文
[1] CHEN D, SHI Z, YUAN P, et al. Trajectory tracking control method and experiment of AGV[C]//Advanced Motion Control (AMC), 2016 IEEE 14th International Workshop on. IEEE, 2016: 24-29.
[2] NORMEY-RICO J E, ALCALá I, GóMEZ-ORTEGA J, et al. Mobile robot path tracking using a robust PID controller[J]. Control Engineering Practice, 2001, 9(11): 1209-1214.
[3] LI X, LUO C, XU Y, et al. A Fuzzy PID controller applied in AGV control system[C]//Advanced Robotics and Mechatronics (ICARM), International Conference on. IEEE, 2016: 555-560.
[4] JUANG C F, CHANG Y C. Evolutionary-group-based particle-swarm-optimized fuzzy controller with application to mobile-robot navigation in unknown environments[J]. IEEE Transactions on Fuzzy Systems, 2011, 19(2): 379-392.
[5] LEE Y H, JIN G G, SO M O. Level control of single water tank systems using Fuzzy-PID technique[J]. Journal of the Korean Society of Marine Engineering, 2014, 38(5): 550-556.
[6] KANAYAMA Y, KIMURA Y, MIYAZAKI F, et al. A stable tracking control method for an autonomous mobile robot[C]//Robotics and Automation, 1990. Proceedings, 1990 IEEE International Conference on. IEEE, 1990: 384-389.
[7] HUNG N, IM J S, JEONG S K, et al. Design of a sliding mode controller for an automatic guided vehicle and its implementation[J]. International Journal of Control, Automation and Systems, 2010, 8(1): 81-90.
[8] WU W, CHEN H, WANG Y, et al. Adaptive exponential stabilization of mobile robots with uncertainties[C]//Decision and Control, 1999. Proceedings of the 38th IEEE Conference on. IEEE, 1999, 4: 3484-3489.
[9] BUI T L, DOAN P T, VAN D T, et al. Hybrid control of a tricycle wheeled AGV for path following using advanced fuzzy-PID[J]. Journal of The Korean Society of Marine Engineering, 2014,38(10):1287-1296.
[10] CAMPION G, BASTIN G, DANDREA-NOVEL B. Structural properties and classification of kinematic and dynamic models of wheeled mobile robots[J]. IEEE Transactions on Robotics and Automation, 1996, 12(1): 47-62.
[11] WAWZYNSKI P. Control policy with autocorrelated noise in reinforcement learning for robotics[J]. International Journal of Machine Learning and Computing, 2015, 5(2): 91-95.
[12] SAMSON L, INTELLIGENTIE B O K, GAVVES E. Deep reinforcement learning applied to the game bubble shooter[D]. Amsterdam:University of Amsterdam, 2016.
[13] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning, 2014:387-395.
[14] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//International Conference on Learning Representations, 2016:1-14.