Develop custom trainers (#73)
* Make create_policy more generic (#54)
* add on/off policy classes and inherit from
* trainers as plugins
* remove swap files
* clean up registration debug
* clean up all pre-commit
* a2c plugin pass precommit
* move gae to trainer utils
* move lambda return to trainer util
* add validator for num_epoch
* add types for settings/type methods
* move create policy into highest level api
* move update_reward_signal into optimizer
* move get_policy into Trainer
* remove get settings type
* dummy_config settings
* move all stats from actor into dict, enables arbitrary actor data
* remove shared_critic flag, cleanups
* refactor create_policy
* remove sample_actions, evaluate_actions, update_norm from policy
* remove comments
* fix return type get stat
* update poca create_policy
* clean up policy init
* remove conftest
* add sharedecritic to settings
* fix test_networks
* fix test_policy
* fix test network
* fix some ppo/sac tests
* add back conftest.py
* improve specification of trainer type
* add defaults fpr trainer_type/hyperparam
* fix test_saver
* fix reward providers
* add settings check utility for tests
* fix some settings tests
* add trainer types to run_experiment
* type check for arbitary actor data
* cherrypick rename ml-agents/trainers/torch to torch_entities (#55)
* make all trainers types and setting visible at module level
* remove settings from run_experiment console script
* fix test_settings and upgrade config scripts
* remove need of trainer_type argument up to trainefactory
* fix gohst trainer behavior id in policy Queue
* fix torch shadow in tests
* update trainers, rl trainers tests
* update tests to match the refactors
* fixing behavior name in ghost trainer
* update ml-agents-envs test configs
* separating the plugin package changes
* bring get_policy back for sake of ghost trainer
* add return types and remove unused returns
* remove duplicate methods in poca (_update_policy, add_policy)
Co-authored-by: mahon94 <maryam.honari@unity3d.com>
* Online/offline custom trainer examples with plugin system (#52)
* add on/off policy classes and inherit from
* trainers as plugins
* a2c trains
* remove swap files
* clean up registration debug
* clean up all pre-commit
* a2c plugin pass precommit
* move gae to trainer utils
* move lambda return to trainer util
* add validator for num_epoch
* add types for settings/type methods
* move create policy into highest level api
* move update_reward_signal into optimizer
* move get_policy into Trainer
* remove get settings type
* dummy_config settings
* move all stats from actor into dict, enables arbitrary actor data
* remove shared_critic flag, cleanups
* refactor create_policy
* remove sample_actions, evaluate_actions, update_norm from policy
* remove comments
* fix return type get stat
* update poca create_policy
* clean up policy init
* remove conftest
* add sharedecritic to settings
* fix test_networks
* fix test_policy
* fix test network
* fix some ppo/sac tests
* add back conftest.py
* improve specification of trainer type
* add defaults fpr trainer_type/hyperparam
* fix test_saver
* fix reward providers
* add settings check utility for tests
* fix some settings tests
* add trainer types to run_experiment
* type check for arbitary actor data
* cherrypick rename ml-agents/trainers/torch to torch_entities (#55)
* make all trainers types and setting visible at module level
* remove settings from run_experiment console script
* fix test_settings and upgrade config scripts
* remove need of trainer_type argument up to trainefactory
* fix gohst trainer behavior id in policy Queue
* fix torch shadow in tests
* update trainers, rl trainers tests
* update tests to match the refactors
* fixing behavior name in ghost trainer
* update ml-agents-envs test configs
* fix precommit
* separating the plugin package changes
* bring get_policy back for sake of ghost trainer
* add return types and remove unused returns
* remove duplicate methods in poca (_update_policy, add_policy)
* add a2c trainer back
* Add DQN cleaned up trainer/optimizer
* nit naming
* fix logprob/entropy types in torch_policy.py
* clean up DQN/SAC
* add docs for custom trainers,TODO: refrence tutorial
* add docs for custom trainers,TODO: refrence tutorial
* add clipping to loss function
* set old importlim-metadata version
* bump precomit hook env to 3.8.x
* use smooth l1 loss
Co-authored-by: mahon94 <maryam.honari@unity3d.com>
* add tutorial for validation
* fix formatting errors
* clean up
* minor changes
Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>
Co-authored-by: zhuo <zhuo@unity3d.com>
2022-10-20 13:06:58 -07:00
# Table of Contents
* [mlagents.trainers.optimizer.torch\_optimizer ](#mlagents.trainers.optimizer.torch_optimizer )
* [TorchOptimizer ](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer )
* [create\_reward\_signals ](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.create_reward_signals )
* [get\_trajectory\_value\_estimates ](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.get_trajectory_value_estimates )
* [mlagents.trainers.optimizer.optimizer ](#mlagents.trainers.optimizer.optimizer )
* [Optimizer ](#mlagents.trainers.optimizer.optimizer.Optimizer )
* [update ](#mlagents.trainers.optimizer.optimizer.Optimizer.update )
<a name="mlagents.trainers.optimizer.torch_optimizer"></a>
# mlagents.trainers.optimizer.torch\_optimizer
<a name="mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer"></a>
## TorchOptimizer Objects
``` python
class TorchOptimizer ( Optimizer )
```
<a name="mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.create_reward_signals"></a>
#### create\_reward\_signals
``` python
| create_reward_signals ( reward_signal_configs : Dict [ RewardSignalType , RewardSignalSettings ] ) - > None
```
Create reward signals
**Arguments ** :
- `reward_signal_configs` : Reward signal config.
<a name="mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.get_trajectory_value_estimates"></a>
#### get\_trajectory\_value\_estimates
``` python
| get_trajectory_value_estimates ( batch : AgentBuffer , next_obs : List [ np . ndarray ] , done : bool , agent_id : str = " " ) - > Tuple [ Dict [ str , np . ndarray ] , Dict [ str , float ] , Optional [ AgentBufferField ] ]
```
Get value estimates and memories for a trajectory, in batch form.
**Arguments ** :
- `batch` : An AgentBuffer that consists of a trajectory.
2024-03-06 14:58:57 +01:00
- `next_obs` : the next observation (after the trajectory). Used for bootstrapping
if this is not a terminal trajectory.
Develop custom trainers (#73)
* Make create_policy more generic (#54)
* add on/off policy classes and inherit from
* trainers as plugins
* remove swap files
* clean up registration debug
* clean up all pre-commit
* a2c plugin pass precommit
* move gae to trainer utils
* move lambda return to trainer util
* add validator for num_epoch
* add types for settings/type methods
* move create policy into highest level api
* move update_reward_signal into optimizer
* move get_policy into Trainer
* remove get settings type
* dummy_config settings
* move all stats from actor into dict, enables arbitrary actor data
* remove shared_critic flag, cleanups
* refactor create_policy
* remove sample_actions, evaluate_actions, update_norm from policy
* remove comments
* fix return type get stat
* update poca create_policy
* clean up policy init
* remove conftest
* add sharedecritic to settings
* fix test_networks
* fix test_policy
* fix test network
* fix some ppo/sac tests
* add back conftest.py
* improve specification of trainer type
* add defaults fpr trainer_type/hyperparam
* fix test_saver
* fix reward providers
* add settings check utility for tests
* fix some settings tests
* add trainer types to run_experiment
* type check for arbitary actor data
* cherrypick rename ml-agents/trainers/torch to torch_entities (#55)
* make all trainers types and setting visible at module level
* remove settings from run_experiment console script
* fix test_settings and upgrade config scripts
* remove need of trainer_type argument up to trainefactory
* fix gohst trainer behavior id in policy Queue
* fix torch shadow in tests
* update trainers, rl trainers tests
* update tests to match the refactors
* fixing behavior name in ghost trainer
* update ml-agents-envs test configs
* separating the plugin package changes
* bring get_policy back for sake of ghost trainer
* add return types and remove unused returns
* remove duplicate methods in poca (_update_policy, add_policy)
Co-authored-by: mahon94 <maryam.honari@unity3d.com>
* Online/offline custom trainer examples with plugin system (#52)
* add on/off policy classes and inherit from
* trainers as plugins
* a2c trains
* remove swap files
* clean up registration debug
* clean up all pre-commit
* a2c plugin pass precommit
* move gae to trainer utils
* move lambda return to trainer util
* add validator for num_epoch
* add types for settings/type methods
* move create policy into highest level api
* move update_reward_signal into optimizer
* move get_policy into Trainer
* remove get settings type
* dummy_config settings
* move all stats from actor into dict, enables arbitrary actor data
* remove shared_critic flag, cleanups
* refactor create_policy
* remove sample_actions, evaluate_actions, update_norm from policy
* remove comments
* fix return type get stat
* update poca create_policy
* clean up policy init
* remove conftest
* add sharedecritic to settings
* fix test_networks
* fix test_policy
* fix test network
* fix some ppo/sac tests
* add back conftest.py
* improve specification of trainer type
* add defaults fpr trainer_type/hyperparam
* fix test_saver
* fix reward providers
* add settings check utility for tests
* fix some settings tests
* add trainer types to run_experiment
* type check for arbitary actor data
* cherrypick rename ml-agents/trainers/torch to torch_entities (#55)
* make all trainers types and setting visible at module level
* remove settings from run_experiment console script
* fix test_settings and upgrade config scripts
* remove need of trainer_type argument up to trainefactory
* fix gohst trainer behavior id in policy Queue
* fix torch shadow in tests
* update trainers, rl trainers tests
* update tests to match the refactors
* fixing behavior name in ghost trainer
* update ml-agents-envs test configs
* fix precommit
* separating the plugin package changes
* bring get_policy back for sake of ghost trainer
* add return types and remove unused returns
* remove duplicate methods in poca (_update_policy, add_policy)
* add a2c trainer back
* Add DQN cleaned up trainer/optimizer
* nit naming
* fix logprob/entropy types in torch_policy.py
* clean up DQN/SAC
* add docs for custom trainers,TODO: refrence tutorial
* add docs for custom trainers,TODO: refrence tutorial
* add clipping to loss function
* set old importlim-metadata version
* bump precomit hook env to 3.8.x
* use smooth l1 loss
Co-authored-by: mahon94 <maryam.honari@unity3d.com>
* add tutorial for validation
* fix formatting errors
* clean up
* minor changes
Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>
Co-authored-by: zhuo <zhuo@unity3d.com>
2022-10-20 13:06:58 -07:00
- `done` : Set true if this is a terminal trajectory.
- `agent_id` : Agent ID of the agent that this trajectory belongs to.
**Returns ** :
A Tuple of the Value Estimates as a Dict of [name, np.ndarray(trajectory_len)],
the final value estimate as a Dict of [name, float], and optionally (if using memories)
an AgentBufferField of initial critic memories to be used during update.
<a name="mlagents.trainers.optimizer.optimizer"></a>
# mlagents.trainers.optimizer.optimizer
<a name="mlagents.trainers.optimizer.optimizer.Optimizer"></a>
## Optimizer Objects
``` python
class Optimizer ( abc . ABC )
```
Creates loss functions and auxillary networks (e.g. Q or Value) needed for training.
Provides methods to update the Policy.
<a name="mlagents.trainers.optimizer.optimizer.Optimizer.update"></a>
#### update
``` python
| @abc.abstractmethod
| update ( batch : AgentBuffer , num_sequences : int ) - > Dict [ str , float ]
```
Update the Policy based on the batch that was passed in.
**Arguments ** :
- `batch` : AgentBuffer that contains the minibatch of data used for this update.
- `num_sequences` : Number of recurrent sequences found in the minibatch.
**Returns ** :
A Dict containing statistics (name, value) from the update (e.g. loss)