TY - GEN
T1 - Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting Task
AU - Kozlica, R.
AU - Wegenkittl, S.
AU - Hirander, S.
N1 - Conference code: 192344
Export Date: 14 December 2023
CODEN: 85PTA
Correspondence Address: Kozlica, R.; Salzburg University of Applied Sciences, Austria; email: [email protected]
Funding details: 20102-F1901166-KZP
Funding text 1: Reuf Kozlica and Simon HirlaÈnder are supported by the Lab for Intelligent Data Analytics Salzburg (IDA Lab) funded by Land Salzburg (WISS 2025) under project number 20102-F1901166-KZP.
References: KEorber, M., Lange, J., Rediske, S., Steinmann, S., GlEuck, R., (2021) Comparing popular simulation environments in the scope of robotics and reinforcement learning, , https://arxiv.org/abs/2103.04616; Hu, L., Liu, Z., Hu, W., Wang, Y., Tan, J., Wu, F., Petri-netbased dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network (2020) Journal of Manufacturing Systems, 55, pp. 1-14. , https://www.sciencedirect.com/science/article/pii/S0278612520300145; Riedmann, S., Harb, J., Hoher, S., Timed coloured petri net simulation model for reinforcement learning in the context of production systems (2022) Production at the Leading Edge of Technology, pp. 457-465. , B.-A. Behrens, A. Brosius, W.-G. Drossel, W. Hintze, S. Ihlenfeldt, and P. Nyhuis, Eds. Cham: Springer International Publishing; Harb, J., Riedmann, S., Wegenkittl, S., Strategies for developing a supervisory controller with deep reinforcement learning in a production context (2022) 2022 IEEE Conference on Control Technology and Applications (CCTA), pp. 869-874; SchEafer, G., Kozlica, R., Wegenkittl, S., Huber, S., An architecture for deploying reinforcement learning in industrial environments (2022) Computer Aided Systems Theory-EUROCAST 2022, R. Moreno-DAaz, F. Pichler, and A. Quesada-Arencibia, Eds. Cham: Springer Nature Switzerland, pp. 569-576; Sutton, R.S., Barto, A.G., (2018) Reinforcement learning: An introduction, , 2nd ed. MIT press; Kober, J., Bagnell, J.A., Peters, J., Reinforcement learning in robotics: A survey (2013) The International Journal of Robotics Research, 32 (11), pp. 1238-1274; Kaelbling, L.P., Littman, M.L., Moore, A.W., Reinforcement learning: A survey (1996) Journal of Artificial Intelligence Research, 4 (1), pp. 237-285; Watkins, C.J.C.H., Dayan, P., Technical note: Q-learning (1992) Mach. Learn, 8 (3-4), pp. 279-292. , https://doi.org/10.1007/BF00992698, may; Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M., (2013) Playing atari with deep reinforcement learning, , https://arxiv.org/abs/1312.5602; Tsitsiklis, J., Van Roy, B., An analysis of temporal-difference learning with function approximation (1997) IEEE Transactions on Automatic Control, 42 (5), pp. 674-690; Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Hassabis, D., Human-level control through deep reinforcement learning (2015) Nature, 518, pp. 529-533; Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P., (2015) Trust region policy optimization, , https://arxiv.org/abs/1502.05477; Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O., (2017) Proximal policy optimization algorithms, , https://arxiv.org/abs/1707.06347; Wang, Y., He, H., Wen, C., Tan, X., (2019) Truly proximal policy optimization, , https://arxiv.org/abs/1903.07940; Seatzu, C., Modeling, analysis, and control of automated manufacturing systems using petri nets (2019) 2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), pp. 27-30; Grobelna, I., Karatkevich, A., Challenges in application of petri nets in manufacturing systems (2021) Electronics, 10 (18). , https://www.mdpi.com/2079-9292/10/18/2305; Pommereau, F., Snakes: A flexible high-level petri nets library (tool paper) (2015) Application and Theory of Petri Nets and Concurrency, pp. 254-265. , R. Devillers and A. Valmari, Eds. Cham: Springer International Publishing; Zinn, J., Vogel-Heuser, B., Schuhmann, F., Salazar, L.A.C., Hierarchical reinforcement learning for waypoint-based exploration in robotic devices (2021) 2021 IEEE 19th International Conference on Industrial Informatics (INDIN), pp. 1-7
PY - 2023
Y1 - 2023
N2 - This paper presents a comparison between two well-known deep Reinforcement Learning (RL) algorithms: Deep Q-Learning (DQN) and Proximal Policy optimization (PPO) in a simulated production system. We utilize a Petri Net (PN)-based simulation environment, which was previously proposed in related work. The performance of the two algorithms is compared based on several evaluation metrics, including average percentage of correctly assembled and sorted products, average episode length, and percentage of successful episodes. The results show that PPO outperforms DQN in terms of all evaluation metrics. The study highlights the advantages of policy-based algorithms in problems with high-dimensional state and action spaces. The study contributes to the field of deep RL in context of production systems by providing insights into the effectiveness of different algorithms and their suitability for different tasks. © 2023 IEEE.
AB - This paper presents a comparison between two well-known deep Reinforcement Learning (RL) algorithms: Deep Q-Learning (DQN) and Proximal Policy optimization (PPO) in a simulated production system. We utilize a Petri Net (PN)-based simulation environment, which was previously proposed in related work. The performance of the two algorithms is compared based on several evaluation metrics, including average percentage of correctly assembled and sorted products, average episode length, and percentage of successful episodes. The results show that PPO outperforms DQN in terms of all evaluation metrics. The study highlights the advantages of policy-based algorithms in problems with high-dimensional state and action spaces. The study contributes to the field of deep RL in context of production systems by providing insights into the effectiveness of different algorithms and their suitability for different tasks. © 2023 IEEE.
KW - Deep Q-Learning
KW - Material Flow System
KW - Petri Nets
KW - Proximal Policy optimization
KW - Reinforcement Learning
KW - Deep learning
KW - Learning systems
KW - Reinforcement learning
KW - Deep Q-learning
KW - Evaluation metrics
KW - Material flow system
KW - Performance comparison
KW - Policy optimization
KW - Production system
KW - Proximal policy optimization
KW - Q-learning
KW - Reinforcement learning algorithms
KW - Reinforcement learnings
KW - Petri nets
U2 - 10.1109/ISIE51358.2023.10228056
DO - 10.1109/ISIE51358.2023.10228056
M3 - Conference contribution
SN - 979-8-3503-9972-1
BT - 2023 IEEE 32nd International Symposium on Industrial Electronics (ISIE)
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Symposium on Industrial Electronics, ISIE 2023
Y2 - 19 June 2023 through 21 June 2023
ER -