Military Decision Support with Actor and Critic Reinforcement Learning Agents.
Abstract
While the recent advanced military operational concept requires intelligent support of command and control, Reinforcement Learning (RL) has not been actively studied in the military domain. This study points out the limitations of RL for military applications from a literature review and aims to improve the understanding of RL for military decision support under these limitations. Most of all, the black box characteristic of Deep RL makes the internal process difficult to understand, in addition to the complex simulation tools. A scalable weapon selection RL framework is built, which can be solved either by a tabular form or a neural network form. The transition of the Deep Q-Network (DQN) solution to the tabular form allows for effective comparison of the results to the Q-learning solution. Furthermore, rather than using one or two RL models selectively as before, RL models are divided into an actor and a critic, and systematically compared. A random agent, Q-learning and DQN agents as critics, a Policy Gradient (PG) agent as an actor, Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) agents as an actor-critic approach are designed, trained, and tested. The performance results show that the trained DQN and PPO agents are the best decision support candidates for the weapon selection RL framework.
Subjects
REINFORCEMENT learning; CRITICS
Description
Indexed in scopushttps://openurl.ebsco.com/EPDB%3Agcd%3A8%3A28280872/detailv2?sid=ebsco%3Aplink%3Aresult-item&id=ebsco%3Adoi%3A10.14429%2Fdsj.74.18864&bquery=Defence%20Science%20Journal&page=2&link_origin=www.google.com |
Article metrics10.31763/DSJ.v5i1.1674 Abstract views : | PDF views : |
Cite |
Full Text![]() |
Conflict of interest
“Authors state no conflict of interest”
Funding Information
This research received no external funding or grants
Peer review:
Peer review under responsibility of Defence Science Journal
Ethics approval:
Not applicable.
Consent for publication:
Not applicable.
Acknowledgements:
None.