Blame: docs/Python-Optimizer-Documentation.md - Unity-Technologies/ml-agents

Unity-Technologies / ml-agents UNCLAIMED

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

0 0 0 C#

Normal View History Raw

Develop custom trainers (#73) * Make create_policy more generic (#54) * add on/off policy classes and inherit from * trainers as plugins * remove swap files * clean up registration debug * clean up all pre-commit * a2c plugin pass precommit * move gae to trainer utils * move lambda return to trainer util * add validator for num_epoch * add types for settings/type methods * move create policy into highest level api * move update_reward_signal into optimizer * move get_policy into Trainer * remove get settings type * dummy_config settings * move all stats from actor into dict, enables arbitrary actor data * remove shared_critic flag, cleanups * refactor create_policy * remove sample_actions, evaluate_actions, update_norm from policy * remove comments * fix return type get stat * update poca create_policy * clean up policy init * remove conftest * add sharedecritic to settings * fix test_networks * fix test_policy * fix test network * fix some ppo/sac tests * add back conftest.py * improve specification of trainer type * add defaults fpr trainer_type/hyperparam * fix test_saver * fix reward providers * add settings check utility for tests * fix some settings tests * add trainer types to run_experiment * type check for arbitary actor data * cherrypick rename ml-agents/trainers/torch to torch_entities (#55) * make all trainers types and setting visible at module level * remove settings from run_experiment console script * fix test_settings and upgrade config scripts * remove need of trainer_type argument up to trainefactory * fix gohst trainer behavior id in policy Queue * fix torch shadow in tests * update trainers, rl trainers tests * update tests to match the refactors * fixing behavior name in ghost trainer * update ml-agents-envs test configs * separating the plugin package changes * bring get_policy back for sake of ghost trainer * add return types and remove unused returns * remove duplicate methods in poca (_update_policy, add_policy) Co-authored-by: mahon94 <maryam.honari@unity3d.com> * Online/offline custom trainer examples with plugin system (#52) * add on/off policy classes and inherit from * trainers as plugins * a2c trains * remove swap files * clean up registration debug * clean up all pre-commit * a2c plugin pass precommit * move gae to trainer utils * move lambda return to trainer util * add validator for num_epoch * add types for settings/type methods * move create policy into highest level api * move update_reward_signal into optimizer * move get_policy into Trainer * remove get settings type * dummy_config settings * move all stats from actor into dict, enables arbitrary actor data * remove shared_critic flag, cleanups * refactor create_policy * remove sample_actions, evaluate_actions, update_norm from policy * remove comments * fix return type get stat * update poca create_policy * clean up policy init * remove conftest * add sharedecritic to settings * fix test_networks * fix test_policy * fix test network * fix some ppo/sac tests * add back conftest.py * improve specification of trainer type * add defaults fpr trainer_type/hyperparam * fix test_saver * fix reward providers * add settings check utility for tests * fix some settings tests * add trainer types to run_experiment * type check for arbitary actor data * cherrypick rename ml-agents/trainers/torch to torch_entities (#55) * make all trainers types and setting visible at module level * remove settings from run_experiment console script * fix test_settings and upgrade config scripts * remove need of trainer_type argument up to trainefactory * fix gohst trainer behavior id in policy Queue * fix torch shadow in tests * update trainers, rl trainers tests * update tests to match the refactors * fixing behavior name in ghost trainer * update ml-agents-envs test configs * fix precommit * separating the plugin package changes * bring get_policy back for sake of ghost trainer * add return types and remove unused returns * remove duplicate methods in poca (_update_policy, add_policy) * add a2c trainer back * Add DQN cleaned up trainer/optimizer * nit naming * fix logprob/entropy types in torch_policy.py * clean up DQN/SAC * add docs for custom trainers,TODO: refrence tutorial * add docs for custom trainers,TODO: refrence tutorial * add clipping to loss function * set old importlim-metadata version * bump precomit hook env to 3.8.x * use smooth l1 loss Co-authored-by: mahon94 <maryam.honari@unity3d.com> * add tutorial for validation * fix formatting errors * clean up * minor changes Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com> Co-authored-by: zhuo <zhuo@unity3d.com> 2022-10-20 13:06:58 -07:00			`# Table of Contents`

			`* [mlagents.trainers.optimizer.torch\_optimizer](#mlagents.trainers.optimizer.torch_optimizer)`
			`* [TorchOptimizer](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer)`
			`* [create\_reward\_signals](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.create_reward_signals)`
			`* [get\_trajectory\_value\_estimates](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.get_trajectory_value_estimates)`
			`* [mlagents.trainers.optimizer.optimizer](#mlagents.trainers.optimizer.optimizer)`
			`* [Optimizer](#mlagents.trainers.optimizer.optimizer.Optimizer)`
			`* [update](#mlagents.trainers.optimizer.optimizer.Optimizer.update)`

			`<a name="mlagents.trainers.optimizer.torch_optimizer"></a>`
			`# mlagents.trainers.optimizer.torch\_optimizer`

			`<a name="mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer"></a>`
			`## TorchOptimizer Objects`

			```python
			`class TorchOptimizer(Optimizer)`
			```

			`<a name="mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.create_reward_signals"></a>`
			`#### create\_reward\_signals`

			```python
			`\| create_reward_signals(reward_signal_configs: Dict[RewardSignalType, RewardSignalSettings]) -> None`
			```

			`Create reward signals`

			`Arguments:`

			- `reward_signal_configs`: Reward signal config.

			`<a name="mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.get_trajectory_value_estimates"></a>`
			`#### get\_trajectory\_value\_estimates`

			```python
			`\| get_trajectory_value_estimates(batch: AgentBuffer, next_obs: List[np.ndarray], done: bool, agent_id: str = "") -> Tuple[Dict[str, np.ndarray], Dict[str, float], Optional[AgentBufferField]]`
			```

			`Get value estimates and memories for a trajectory, in batch form.`

			`Arguments:`

			- `batch`: An AgentBuffer that consists of a trajectory.
Fix documentation typos and list rendering (#6066) * Fix list being rendered incorrectly in webdocs I assume this extra blank line will fix the list not being correctly formatted on https://unity-technologies.github.io/ml-agents/#releases-documentation * Fix typos in docs * Fix more mis-rendered lists Add a blank line before bulleted lists in markdown files to avoid them being rendered as in-paragraph sentences that all start with hyphens. * Fix typos in python comments used to generate docs 2024-03-06 14:58:57 +01:00			- `next_obs`: the next observation (after the trajectory). Used for bootstrapping
			`if this is not a terminal trajectory.`
Develop custom trainers (#73) * Make create_policy more generic (#54) * add on/off policy classes and inherit from * trainers as plugins * remove swap files * clean up registration debug * clean up all pre-commit * a2c plugin pass precommit * move gae to trainer utils * move lambda return to trainer util * add validator for num_epoch * add types for settings/type methods * move create policy into highest level api * move update_reward_signal into optimizer * move get_policy into Trainer * remove get settings type * dummy_config settings * move all stats from actor into dict, enables arbitrary actor data * remove shared_critic flag, cleanups * refactor create_policy * remove sample_actions, evaluate_actions, update_norm from policy * remove comments * fix return type get stat * update poca create_policy * clean up policy init * remove conftest * add sharedecritic to settings * fix test_networks * fix test_policy * fix test network * fix some ppo/sac tests * add back conftest.py * improve specification of trainer type * add defaults fpr trainer_type/hyperparam * fix test_saver * fix reward providers * add settings check utility for tests * fix some settings tests * add trainer types to run_experiment * type check for arbitary actor data * cherrypick rename ml-agents/trainers/torch to torch_entities (#55) * make all trainers types and setting visible at module level * remove settings from run_experiment console script * fix test_settings and upgrade config scripts * remove need of trainer_type argument up to trainefactory * fix gohst trainer behavior id in policy Queue * fix torch shadow in tests * update trainers, rl trainers tests * update tests to match the refactors * fixing behavior name in ghost trainer * update ml-agents-envs test configs * separating the plugin package changes * bring get_policy back for sake of ghost trainer * add return types and remove unused returns * remove duplicate methods in poca (_update_policy, add_policy) Co-authored-by: mahon94 <maryam.honari@unity3d.com> * Online/offline custom trainer examples with plugin system (#52) * add on/off policy classes and inherit from * trainers as plugins * a2c trains * remove swap files * clean up registration debug * clean up all pre-commit * a2c plugin pass precommit * move gae to trainer utils * move lambda return to trainer util * add validator for num_epoch * add types for settings/type methods * move create policy into highest level api * move update_reward_signal into optimizer * move get_policy into Trainer * remove get settings type * dummy_config settings * move all stats from actor into dict, enables arbitrary actor data * remove shared_critic flag, cleanups * refactor create_policy * remove sample_actions, evaluate_actions, update_norm from policy * remove comments * fix return type get stat * update poca create_policy * clean up policy init * remove conftest * add sharedecritic to settings * fix test_networks * fix test_policy * fix test network * fix some ppo/sac tests * add back conftest.py * improve specification of trainer type * add defaults fpr trainer_type/hyperparam * fix test_saver * fix reward providers * add settings check utility for tests * fix some settings tests * add trainer types to run_experiment * type check for arbitary actor data * cherrypick rename ml-agents/trainers/torch to torch_entities (#55) * make all trainers types and setting visible at module level * remove settings from run_experiment console script * fix test_settings and upgrade config scripts * remove need of trainer_type argument up to trainefactory * fix gohst trainer behavior id in policy Queue * fix torch shadow in tests * update trainers, rl trainers tests * update tests to match the refactors * fixing behavior name in ghost trainer * update ml-agents-envs test configs * fix precommit * separating the plugin package changes * bring get_policy back for sake of ghost trainer * add return types and remove unused returns * remove duplicate methods in poca (_update_policy, add_policy) * add a2c trainer back * Add DQN cleaned up trainer/optimizer * nit naming * fix logprob/entropy types in torch_policy.py * clean up DQN/SAC * add docs for custom trainers,TODO: refrence tutorial * add docs for custom trainers,TODO: refrence tutorial * add clipping to loss function * set old importlim-metadata version * bump precomit hook env to 3.8.x * use smooth l1 loss Co-authored-by: mahon94 <maryam.honari@unity3d.com> * add tutorial for validation * fix formatting errors * clean up * minor changes Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com> Co-authored-by: zhuo <zhuo@unity3d.com> 2022-10-20 13:06:58 -07:00			- `done`: Set true if this is a terminal trajectory.
			- `agent_id`: Agent ID of the agent that this trajectory belongs to.

			`Returns:`

			`A Tuple of the Value Estimates as a Dict of [name, np.ndarray(trajectory_len)],`
			`the final value estimate as a Dict of [name, float], and optionally (if using memories)`
			`an AgentBufferField of initial critic memories to be used during update.`

			`<a name="mlagents.trainers.optimizer.optimizer"></a>`
			`# mlagents.trainers.optimizer.optimizer`

			`<a name="mlagents.trainers.optimizer.optimizer.Optimizer"></a>`
			`## Optimizer Objects`

			```python
			`class Optimizer(abc.ABC)`
			```

			`Creates loss functions and auxillary networks (e.g. Q or Value) needed for training.`
			`Provides methods to update the Policy.`

			`<a name="mlagents.trainers.optimizer.optimizer.Optimizer.update"></a>`
			`#### update`

			```python
			`\| @abc.abstractmethod`
			`\| update(batch: AgentBuffer, num_sequences: int) -> Dict[str, float]`
			```

			`Update the Policy based on the batch that was passed in.`

			`Arguments:`

			- `batch`: AgentBuffer that contains the minibatch of data used for this update.
			- `num_sequences`: Number of recurrent sequences found in the minibatch.

			`Returns:`

			`A Dict containing statistics (name, value) from the update (e.g. loss)`