The Reinforcement Learning tool enables you to configure your model to be used as an environment for reinforcement learning algorithms.
When a reinforcement learning algorithm launches FlexSim in order to train an agent, it communicates with this tool using sockets. This tool is designed to follow a particular socket protocol for handling each function in a custom reinforcement learning environment.
For more information and steps to get started, see Key Concepts About Reinforcement Learning.
A model may contain multiple Reinforcement Learning tool objects. When training, only the first Reinforcement Learning tool object will be used to learn. The others will execute the On Request Action trigger to make random, heuristic, or trained decisions at the specified events. You can use the right-click Move Up option in the Toolbox pane in order to reorder the Reinforcement Learning objects in your model.
The Reinforcement Learning tool has the following properties:
The Observation Space configures the set of parameters that are used as input states to the reinforcement learning algorithm. Before beginning training, the algorithm needs to know the range of available inputs in order to learn how those states correspond to received rewards for taking actions.
The following spaces types are available:
The Action Space configures the set of parameters that are used as the potential actions that an agent can take after an observation. Before beginning training, the algorithm needs to know the range of available actions in order to learn how those actions correspond to received rewards for an observed state.
The Action Space contains the same options as the observation space above, specifying an action as a set of parameters with constrained ranges of possible values.
For a Custom action space, the Take Action callback receives a string describing the action that should be taken, and the callback code can then parse that string and take the appropriate action.
The following triggers are used by the Reinforcement Learning tool:
When the model is reset, an initial observation and action will be taken. The model will then run. At each of the specified decision events, a reward will be received for the previous actions, and if the episode is not done, another observation and action will be taken. This cycle will continue until the Reward Function returns that the episode is done.
The logic for returning a reward, making an observation, and taking an action will happen immediately before the specified event. You can specify the important events where a decision needs to be made, and this object will set the action parameters just before that event is executed so that you can use those action parameters within the event.
In addition to specifying decision events, you can also request a decision at a particular moment by using custom code. The reward, observation, and action will all happen immediately inline during that function. You can then use the action parameters immediately after the function.