Blame: python/paddle/distributed/auto_parallel/interface.py - PaddlePaddle/Paddle

PaddlePaddle / Paddle UNCLAIMED

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

0 0 1 C++

Normal View History Raw

add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00			`# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.`
			`#`
			`# Licensed under the Apache License, Version 2.0 (the "License");`
			`# you may not use this file except in compliance with the License.`
			`# You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing, software`
			`# distributed under the License is distributed on an "AS IS" BASIS,`
			`# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`# See the License for the specific language governing permissions and`
			`# limitations under the License.`
[Typing][PEP585 Upgrade][BUAA][21-30] Use standard collections for type hints for 9 files in `python/paddle/` (#67119) 2024-08-07 21:27:40 +08:00			`from __future__ import annotations`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
[AutoParallel] add create_mesh api (#58659) * [AutoParallel] add create_mesh api * fix completion and partitioner * revert * format * add ut 2023-11-10 17:04:51 +08:00			`from functools import reduce`

			`import numpy as np`

[Auto Parallel] Bugfix allreduce fuse for MP (#46086) * bugfix * bugfix * typos fixed 2022-09-16 17:12:38 +08:00			`import paddle`
[SemiAuto] add static branch for shard_tensor (#56561) * shard_tensor support static graph * add comments * add dy2static ut * use property in c++ side 2023-08-24 14:00:34 +08:00			`from paddle.framework import core`
[CodeStyle][isort] introduce isort (part4) (#48402) * isort all files * revert conflicting files * revert conflicting files * revert conflicting files 2022-11-29 18:50:04 +08:00
			`from .process_mesh import ProcessMesh, get_current_process_mesh`
[Auto Parallel] Reorganize the fold structure (#54059) * [Auto Parallel] Reorganize the fold structure * [Auto Parallel] Fix some import errors 2023-05-30 14:07:49 +08:00			`from .static.dist_context import get_default_distributed_context`
			`from .static.dist_op import DistributedOperatorHelper`
			`from .static.dist_tensor import DistributedTensor`
			`from .static.utils import (`
manage no shape var type (#47775) 2022-11-09 14:45:44 +08:00			`__no_shape_var_type__,`
[CodeStyle][isort] introduce isort (part4) (#48402) * isort all files * revert conflicting files * revert conflicting files * revert conflicting files 2022-11-29 18:50:04 +08:00			`convert_to_dims_mapping,`
			`verify_shard_spec,`
manage no shape var type (#47775) 2022-11-09 14:45:44 +08:00			`)`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00

[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`def shard_tensor(x, process_mesh=None, shard_spec=None):`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00			`"""`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`Shard a tensor on a process mesh according to the shard specification.`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
			`Args:`
[Auto Parallel] Improve the interface and the underlying mechanisms (#36617) * default dist op * add dist_attr for dist op * add unitest * update inputname * update function name * add unitest * update CMakeLists.txt for CI * fix dis_matmul * fix compile error * update matmul to matmul_v2 * unify api * unify api * todo * update distop forward func * update distop forward func * auto parallel backward * update dist op * autoparallel backward * add backward for embedding * temp1 * temp2 * temp3 * temp4 * backward done1 * backward done2 * backward done3 * dist embedding remove mp mode * dist matmul remove mp mode * update dist embedding 『 * dist op init1 * dist op init 2 * update unitest * context remove parallel mode * partitioner remove parallel mode * update unitest * a more general method to support varying mesh in pipeline parallel * support varying mesh in pipeline parallel * embedding support varying mesh in pipeline parallel * matmul support varying mesh in pipeline parallel * default dist op support varying mesh in pipeline parallel * dist attribute for startup program * default dist op support varying mesh in pipeline parallel 2 * partitoner support varying mesh in pipeline parallel * revise logic for auto compeletion * revise framework.py * revise reshard unitest * revise unitest for parallelize * chmod * fixed bug for dist embedding name mapping * Improve the interface and the underlying mechanisms of auto parallel * revise completion for backward * revise completion for update * revise completion for update * update unitest * chmod * bugfix for grad_op output var's mesh * Modify codes for pr 36744 * Remove unnecessary comments in framework.py * Remove unnecessary comments in completion.py Co-authored-by: JZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: JZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com> 2021-10-29 11:20:04 +08:00			`x (Tensor): the tensor to be sharded.`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`process_mesh (ProcessMesh, optional): An instance of ProcessMesh describes a mesh`
			`topology of the used logical processes where the tensor is sharded. If it is None,`
[Auto Parallel] Bugfix allreduce fuse for MP (#46086) * bugfix * bugfix * typos fixed 2022-09-16 17:12:38 +08:00			`the found current process mesh will be used. And an error will be raised if the`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`current process mesh cannot be found. Default: None.`
			shard_spec (list, optional): a list to describe the sharding mapping between `x` and `process_mesh`,
			which means the dimension `i` of `x` is split across the dimension `shard_spec[i]` of `process_mesh`,
[CodeStyle][Typos][D-2,F-4,F-18,F-25,I-43,O-3,O-6,O-8,S-20,T-10,T-22,U-14,W-4,W-8,W-12,W-18] Ignore 1-3 letter words to reduce false positives (#70623) 2025-01-04 19:47:58 +08:00			where `None` means that tensor dimension is not split. For example, given a tensor with
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`the shape [6, 12] and a process mesh with the shape [2, 3] and the dimension names ["x", "y"]:`
			If `shard_spec=["x", "y"]`, each shard of the tensor will have a shape [3, 4];
			If `shard_spec=["y", "x"]`, each shard of the tensor will have a shape [2, 6];
			If `shard_spec=["x", None]`, each shard of the tensor will have a shape [3, 12];
			If `shard_spec=[None, "x"]`, each shard of the tensor will have a shape [6, 4];
			If `shard_spec=["y", None]`, each shard of the tensor will have a shape [2, 12];
			If `shard_spec=[None, "y"]`, each shard of the tensor will have a shape [6, 4];
			If `shard_spec=[None, None]`, each shard of the tensor will have a shape [6, 12];
			If the `shard_spec` is None, the tensor will be replicated across all the processes of `process_mesh`.
			In the above example, the `shard_spec=None` is same as 'shard_spec=[None, None]'. Defaults: None.
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
			`Returns:`
[Auto Parallel] Bugfix allreduce fuse for MP (#46086) * bugfix * bugfix * typos fixed 2022-09-16 17:12:38 +08:00			Tensor: the tensor `x` annotated with sharding information.
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
			`Examples:`
[CodeStyle][DocFormat][139] Use pycon marker in auto parallel docs examples (#78003) 2026-02-20 18:49:20 +08:00			`.. code-block:: pycon`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
[xdoctest][task 181-183] reformat example code with google style in `sparse/multiary.py`,`distributed/auto_parallel/` (#56665) [Doctest]fix No.181-183, test=docs_preview * add env skip 2023-08-29 10:38:53 +08:00			`>>> # doctest: +REQUIRES(env:DISTRIBUTED)`
			`>>> import paddle`
			`>>> from paddle.distributed.fleet import auto`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
[xdoctest][task 181-183] reformat example code with google style in `sparse/multiary.py`,`distributed/auto_parallel/` (#56665) [Doctest]fix No.181-183, test=docs_preview * add env skip 2023-08-29 10:38:53 +08:00			`>>> mesh = auto.ProcessMesh([[0, 1], [2, 3]], dim_names=["x", "y"])`
			`>>> x = paddle.ones([4, 6])`
			`>>> shard_spec = ["x", "y"]`
			`>>> auto.shard_tensor(x, mesh, shard_spec)`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
			`"""`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00
			`if process_mesh is not None:`
[CodeStyle] `black -> ruff format` migration - part 26 (#74713) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2025-08-19 14:06:48 +08:00			`assert isinstance(process_mesh, core.ProcessMesh), (`
			`f"Argument process_mesh {process_mesh} is not an instance of ProcessMesh"`
			`)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`else:`
			`process_mesh = get_current_process_mesh()`
[CodeStyle] `black -> ruff format` migration - part 26 (#74713) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2025-08-19 14:06:48 +08:00			`assert process_mesh is not None, (`
			`"Specify the process mesh argument or use ProcessMesh context manager first."`
			`)`
			`assert isinstance(shard_spec, list), (`
			`f"Argument shard_spec {shard_spec} is not an instance of list"`
			`)`
[Auto Parallel] Move some changes or bug fixes from 2.4 to develop (#52721) * [Auto Parallel] Speedup the completion process * [Auto Parallel] Skip the property of dist_context when deepcopying * [Auto Parallel] Remove the unnecessary print * [Auto Parallel] Move some changes from 2.4 branch to develop * Update engine.py * [Auto Parallel] Fix a bug 2023-04-12 18:25:39 +08:00			`if isinstance(x, str):`
			`x = (`
			`paddle.static.default_main_program()`
			`.global_block()`
			`._var_recursive(x)`
			`)`
			`dist_tensor = DistributedTensor(x)`
			`else:`
			`dist_tensor = DistributedTensor(x)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`serial_tensor = dist_tensor.serial_tensor`
			`dist_tensor.dist_attr.process_mesh = process_mesh`
manage no shape var type (#47775) 2022-11-09 14:45:44 +08:00			`if serial_tensor.type in __no_shape_var_type__:`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`tensor_shape = []`
			`else:`
			`tensor_shape = serial_tensor.shape`
			`if shard_spec is not None:`
4th-batch-101-检查不严谨可能导致分片行为不一致 (#75823) * 1013 * 101 * 1015 * 1015 * 1016 * 1016 * 1017 * 1017 2025-10-21 10:21:02 +08:00			`valid_dims = (`
			`process_mesh.get_dim_names()`
			`if hasattr(process_mesh, "get_dim_names")`
			`else process_mesh.dim_names`
			`)`
			`for i, dim in enumerate(shard_spec):`
			`if dim is not None and (`
			`not isinstance(dim, str) or dim not in valid_dims`
			`):`
			`raise ValueError(`
			`f"Invalid shard_spec at index {i}: '{dim}' "`
			`f"is not a valid dimension name in process_mesh {valid_dims}."`
			`)`
[CodeStyle] `black -> ruff format` migration - part 26 (#74713) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2025-08-19 14:06:48 +08:00			`assert verify_shard_spec(shard_spec, tensor_shape, process_mesh), (`
			`f"For tensor {serial_tensor.name}, shard_spec {shard_spec} is invalid with tensor_shape {tensor_shape} and process_mesh {process_mesh}."`
			`)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`dist_tensor.dist_attr.dims_mapping = convert_to_dims_mapping(`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`shard_spec, process_mesh`
			`)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`if process_mesh is not None:`
			`dist_tensor.dist_attr.mark_annotated("process_mesh")`
			`if shard_spec is not None:`
			`dist_tensor.dist_attr.mark_annotated("dims_mapping")`
[Auto Parallel] Improve the interface and the underlying mechanisms (#36617) * default dist op * add dist_attr for dist op * add unitest * update inputname * update function name * add unitest * update CMakeLists.txt for CI * fix dis_matmul * fix compile error * update matmul to matmul_v2 * unify api * unify api * todo * update distop forward func * update distop forward func * auto parallel backward * update dist op * autoparallel backward * add backward for embedding * temp1 * temp2 * temp3 * temp4 * backward done1 * backward done2 * backward done3 * dist embedding remove mp mode * dist matmul remove mp mode * update dist embedding 『 * dist op init1 * dist op init 2 * update unitest * context remove parallel mode * partitioner remove parallel mode * update unitest * a more general method to support varying mesh in pipeline parallel * support varying mesh in pipeline parallel * embedding support varying mesh in pipeline parallel * matmul support varying mesh in pipeline parallel * default dist op support varying mesh in pipeline parallel * dist attribute for startup program * default dist op support varying mesh in pipeline parallel 2 * partitoner support varying mesh in pipeline parallel * revise logic for auto compeletion * revise framework.py * revise reshard unitest * revise unitest for parallelize * chmod * fixed bug for dist embedding name mapping * Improve the interface and the underlying mechanisms of auto parallel * revise completion for backward * revise completion for update * revise completion for update * update unitest * chmod * bugfix for grad_op output var's mesh * Modify codes for pr 36744 * Remove unnecessary comments in framework.py * Remove unnecessary comments in completion.py Co-authored-by: JZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: JZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com> 2021-10-29 11:20:04 +08:00			`default_dist_ctx = get_default_distributed_context()`
			`default_dist_ctx.add_dist_tensor_for_program(dist_tensor)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`dist_tensor = default_dist_ctx.get_dist_tensor_for_program(x)`
[Auto Parallel] Move some changes or bug fixes from 2.4 to develop (#52721) * [Auto Parallel] Speedup the completion process * [Auto Parallel] Skip the property of dist_context when deepcopying * [Auto Parallel] Remove the unnecessary print * [Auto Parallel] Move some changes from 2.4 branch to develop * Update engine.py * [Auto Parallel] Fix a bug 2023-04-12 18:25:39 +08:00			`default_dist_ctx.add_process_mesh(process_mesh)`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00			`return x`


[AutoParallel] add chunk id for vpp in TensorDistAttr and OperatorDistAttr (#59416) * [AutoParallel] add chunk id for vpp in TensorDistAttr and OperatorDistAttr * update shard_op api * add ut * fix ut * fix ut * add default value for chunk id 2023-11-29 19:07:37 +08:00			`def shard_op(`
			`op, process_mesh=None, in_shard_specs=None, out_shard_specs=None, **kwargs`
			`):`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00			`"""`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`Shard an operation on a process mesh according to its input and output shard specification.`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
			`Args:`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`op (Callable): a callable operator or module to be sharded.`
			`process_mesh (ProcessMesh, optional): An instance of ProcessMesh describes a mesh`
			`topology of the used logical processes where the op is sharded. All of its inputs and`
			`outputs are sharded by this process mesh. If it is None, the found current process mesh`
			`will be used. And an error will be raised if the current process mesh cannot be found.`
			`Default: None.`
			`in_shard_specs (list of list, optional): a list of list to describe the sharding specifications`
Fix typos (#50894) 2023-02-27 16:30:10 +08:00			for the inputs. Each item of `in_shard_specs` is a `shard_spec` between the corresponding input
			and `process_mesh`. If one item is None, the corresponding input is replicated across all processes
			`If it is None, all inputs are replicated across all processes. Note that the length of the`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`in_shard_specs` should be equal to the actual number of inputs when calling this operation.
			`Default: None.`
			`out_shard_specs (list of list, optional): a list of list to describe the sharding specifications`
Fix typos (#50894) 2023-02-27 16:30:10 +08:00			for the outputs. Each item of `out_shard_specs` is a `shard_spec` between the corresponding output
			and `process_mesh`. If one item is None, the corresponding output is replicated across all processes
			`If it is None, all outputs are replicated across all processes. Note that the length of the`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`in_shard_specs` should be equal to the actual number of inputs when calling this operation.
			`Default: None. Default: None.`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
			`Returns:`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			Outputs of `op`, each of which is annotated with sharding information.
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
			`Examples:`
[CodeStyle][DocFormat][139] Use pycon marker in auto parallel docs examples (#78003) 2026-02-20 18:49:20 +08:00			`.. code-block:: pycon`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
[xdoctest][task 181-183] reformat example code with google style in `sparse/multiary.py`,`distributed/auto_parallel/` (#56665) [Doctest]fix No.181-183, test=docs_preview * add env skip 2023-08-29 10:38:53 +08:00			`>>> import paddle`
			`>>> from paddle.distributed.fleet import auto`

			`>>> x = paddle.ones([4, 6])`
			`>>> y = paddle.zeros([4, 6])`
			`>>> mesh = auto.ProcessMesh([[0, 1], [2, 3]], dim_names=["x", "y"])`
[CodeStyle][DocFormat][139] Use pycon marker in auto parallel docs examples (#78003) 2026-02-20 18:49:20 +08:00			`>>> dist_add = auto.shard_op(`
			`... paddle.add,`
			`... mesh,`
			`... in_shard_specs=[["x", "y"], ["y", None]],`
			`... out_shard_specs=[[None, "x"]],`
			`... )`
[xdoctest][task 181-183] reformat example code with google style in `sparse/multiary.py`,`distributed/auto_parallel/` (#56665) [Doctest]fix No.181-183, test=docs_preview * add env skip 2023-08-29 10:38:53 +08:00			`>>> dist_add(x, y)`
add the basic apis for auto_parallel (#33804) * add auto_parallel apis 2021-08-11 15:20:25 +08:00
			`"""`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00
			`if process_mesh is not None:`
[CodeStyle] `black -> ruff format` migration - part 26 (#74713) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2025-08-19 14:06:48 +08:00			`assert isinstance(process_mesh, ProcessMesh), (`
			`f"Argument process_mesh {process_mesh} is not an instance of ProcessMesh"`
			`)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`else:`
			`process_mesh = get_current_process_mesh()`
[CodeStyle] `black -> ruff format` migration - part 26 (#74713) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2025-08-19 14:06:48 +08:00			`assert process_mesh is not None, (`
			`"Specify the process mesh argument or use ProcessMesh context manager first."`
			`)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`in_dims_mappings = []`
			`if in_shard_specs is not None:`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`assert all(`
			`(isinstance(shard_spec, list) or shard_spec is None)`
			`for shard_spec in in_shard_specs`
[CodeStyle][task 1] enable Ruff UP032 rule with . except `python/paddle/base` (#57409) * update up032 * update up032 * Update api_gen.py * Update api_gen.py * Update sampcd_processor_utils.py 2023-09-22 10:14:38 +08:00			`), f"in_shard_spec {in_shard_specs} is not a list of list or None"`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`for shard_spec in in_shard_specs:`
			`if shard_spec is not None:`
			`in_dims_mappings.append(`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`convert_to_dims_mapping(shard_spec, process_mesh)`
			`)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`else:`
			`in_dims_mappings.append(None)`
			`out_dims_mappings = []`
			`if out_shard_specs is not None:`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`assert all(`
			`(isinstance(shard_spec, list) or shard_spec is None)`
			`for shard_spec in out_shard_specs`
[CodeStyle][task 1] enable Ruff UP032 rule with . except `python/paddle/base` (#57409) * update up032 * update up032 * Update api_gen.py * Update api_gen.py * Update sampcd_processor_utils.py 2023-09-22 10:14:38 +08:00			`), f"out_shard_spec {out_shard_specs} is not a list of list or None"`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`for shard_spec in out_shard_specs:`
			`if shard_spec is not None:`
			`out_dims_mappings.append(`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`convert_to_dims_mapping(shard_spec, process_mesh)`
			`)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`else:`
			`out_dims_mappings.append(None)`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`op = DistributedOperatorHelper(`
[AutoParallel] add chunk id for vpp in TensorDistAttr and OperatorDistAttr (#59416) * [AutoParallel] add chunk id for vpp in TensorDistAttr and OperatorDistAttr * update shard_op api * add ut * fix ut * fix ut * add default value for chunk id 2023-11-29 19:07:37 +08:00			`op, process_mesh, in_dims_mappings, out_dims_mappings, kwargs`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`return op`


[AutoParallel] selective recompute (#48111) * [AutoParallel] selective recompute * add cmakelist 2022-11-18 14:57:43 +08:00			`_g_recompute_idx = -1`


[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`def recompute(op):`
[AutoParallel] selective recompute (#48111) * [AutoParallel] selective recompute * add cmakelist 2022-11-18 14:57:43 +08:00			`global _g_recompute_idx`
			`_g_recompute_idx += 1`

[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`class RecomputeOperator:`
			`def __init__(self, op):`
			`self._op = op`

			`def __call__(self, args, *kwargs):`
[PIR-Auto-Parallel]refactor recompute pass in PIR mode (#69681) 2024-12-05 22:39:41 +08:00			`block = paddle.static.default_main_program().global_block()`
			`rc_begin_id = len(block.ops)`

[AutoParallel] update recompute api (#60179) * update recompute api * get recompute seg_name 2023-12-20 20:26:19 +08:00			`with paddle.static.name_scope(`
			`f'/auto_parallel/rc_{_g_recompute_idx}'`
			`):`
【AutoParallel】Add auto_cast support in auto_parallel (#60158) * add auto_cast support in auto_parallel * change code according to review * fix bug in recompute function 2023-12-22 17:32:00 +08:00			`if paddle.base.dygraph.base.in_to_static_mode():`
			`output = (`
			`paddle.jit.dy2static.convert_call_func.convert_call(`
			`self._op`
			`)(args, *kwargs)`
			`)`
			`else:`
			`output = self._op(args, *kwargs)`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00
[PIR-Auto-Parallel]refactor recompute pass in PIR mode (#69681) 2024-12-05 22:39:41 +08:00			`if paddle.framework.in_pir_mode():`
			`block = paddle.static.default_main_program().global_block()`
			`rc_end_id = len(block.ops)`
			`for idx in range(rc_begin_id, rc_end_id):`
			`rc_op = block.ops[idx]`
			`rc_op.set_int_attr("fwd_recompute_id", _g_recompute_idx)`

[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`return output`

			`return RecomputeOperator(op)`


【AutoParallelism】Add refined recompute support (#58421) * add refined-recompute support * fix bug in recompute_pass * fix coverage 2023-10-31 15:13:37 +08:00			`def exclude_ops_in_recompute(run_function):`
			`"""`
Fix defition definition (#60679) 2024-01-11 14:32:52 +08:00			`Exclude some operators in recompute segments.`
【AutoParallelism】Add refined recompute support (#58421) * add refined-recompute support * fix bug in recompute_pass * fix coverage 2023-10-31 15:13:37 +08:00			`Args:`
Fix defition definition (#60679) 2024-01-11 14:32:52 +08:00			`run_function (callable): The callable function to be excluded.`
【AutoParallelism】Add refined recompute support (#58421) * add refined-recompute support * fix bug in recompute_pass * fix coverage 2023-10-31 15:13:37 +08:00
			`Returns:`
			`ExcludeOperator: The callable object.`

			`"""`

			`class ExcludeOperator:`
			`def __init__(self, run_function):`
			`self._run_function = run_function`

			`def __call__(self, args, *kwargs):`
[AutoParallel] update recompute api (#60179) * update recompute api * get recompute seg_name 2023-12-20 20:26:19 +08:00			`with paddle.static.name_scope('/exclude_rc'):`
【AutoParallel】Add auto_cast support in auto_parallel (#60158) * add auto_cast support in auto_parallel * change code according to review * fix bug in recompute function 2023-12-22 17:32:00 +08:00			`if paddle.base.dygraph.base.in_to_static_mode():`
			`output = (`
			`paddle.jit.dy2static.convert_call_func.convert_call(`
			`self._run_function`
			`)(args, *kwargs)`
			`)`
			`else:`
			`output = self._run_function(args, *kwargs)`
【AutoParallelism】Add refined recompute support (#58421) * add refined-recompute support * fix bug in recompute_pass * fix coverage 2023-10-31 15:13:37 +08:00
			`return output`

			`return ExcludeOperator(run_function)`


[Auto Parallel] Make Engine class callable (#46416) * [Auto Parallel] Imporve the user-defined fetches and logging * [Auto Parallel] Make Engine class callable * [Auto Parallel] Update the data loading of tuner 2022-09-27 16:59:43 +08:00			`_g_collections = {}`


[CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition (#47642) * [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition * fix an increment 2022-11-08 11:29:41 +08:00			`class CollectionNames:`
[Auto Parallel] Make Engine class callable (#46416) * [Auto Parallel] Imporve the user-defined fetches and logging * [Auto Parallel] Make Engine class callable * [Auto Parallel] Update the data loading of tuner 2022-09-27 16:59:43 +08:00			`FETCHES = "fetches"`
[Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633) * [Auto Parallel] Unify the logger and outputs of Engine API * [Auto Parallel] Fix the bugs of to_static * [Auto Parallel] Adjust the test_to_static.py 2022-10-10 16:00:10 +08:00			`LOGGING = "logging"`
[Auto Parallel] Make Engine class callable (#46416) * [Auto Parallel] Imporve the user-defined fetches and logging * [Auto Parallel] Make Engine class callable * [Auto Parallel] Update the data loading of tuner 2022-09-27 16:59:43 +08:00

			`def get_collection(name):`
			`collection = _g_collections.get(name, None)`
			`if collection is None:`
			`collection = []`
			`_g_collections[name] = collection`
			`return _g_collections[name]`


[Auto Parallel] Improve the fine-grained APIs (#46552) * [Auto Parallel] Suppport different dataloaders * [Auto Parallel] Add num_shards config for dataset * [Auto Parallel] Unify the logger and outputs of Engine API * [Auto Parallel] Fix the bugs of to_static * [Auto Parallel] Adjust the test_to_static.py * [Auto Parallel] Add the prepare API and replace __call__ with run * [Auto Parallel] Improve the private implementations of Engine * [Auto Parallel] Set capacity of dataloader for opt tuning * [Auto Parallel] [WIP] Change the fine-grained API * [Auto Parallel] Improve APIs to support different user cases * [Auto Parallel] Add removed config * [Auto Parallel] Add imports * [Auto Parallel] Fix bugs for to_static * [Auto Parallel] Remove unnecessary imports 2022-10-12 19:29:06 +08:00			`def add_to_collection(collection_name, value, name=None):`
[Auto Parallel] Make Engine class callable (#46416) * [Auto Parallel] Imporve the user-defined fetches and logging * [Auto Parallel] Make Engine class callable * [Auto Parallel] Update the data loading of tuner 2022-09-27 16:59:43 +08:00			`if collection_name not in _g_collections:`
			`_g_collections[collection_name] = []`
[Auto Parallel] Improve the fine-grained APIs (#46552) * [Auto Parallel] Suppport different dataloaders * [Auto Parallel] Add num_shards config for dataset * [Auto Parallel] Unify the logger and outputs of Engine API * [Auto Parallel] Fix the bugs of to_static * [Auto Parallel] Adjust the test_to_static.py * [Auto Parallel] Add the prepare API and replace __call__ with run * [Auto Parallel] Improve the private implementations of Engine * [Auto Parallel] Set capacity of dataloader for opt tuning * [Auto Parallel] [WIP] Change the fine-grained API * [Auto Parallel] Improve APIs to support different user cases * [Auto Parallel] Add removed config * [Auto Parallel] Add imports * [Auto Parallel] Fix bugs for to_static * [Auto Parallel] Remove unnecessary imports 2022-10-12 19:29:06 +08:00			`if name is not None:`
[AutoParallel] add callbacks (#47014) * [AutoParallel] add callbacks * fix unittest * fix dist_context * fix engine * fix cmakelist * fix unittest's returns * fix cmakelist 2022-10-18 10:00:40 +08:00			`for _, v in _g_collections[collection_name]:`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`if v == value:`
			`return`
[Auto Parallel] Improve the fine-grained APIs (#46552) * [Auto Parallel] Suppport different dataloaders * [Auto Parallel] Add num_shards config for dataset * [Auto Parallel] Unify the logger and outputs of Engine API * [Auto Parallel] Fix the bugs of to_static * [Auto Parallel] Adjust the test_to_static.py * [Auto Parallel] Add the prepare API and replace __call__ with run * [Auto Parallel] Improve the private implementations of Engine * [Auto Parallel] Set capacity of dataloader for opt tuning * [Auto Parallel] [WIP] Change the fine-grained API * [Auto Parallel] Improve APIs to support different user cases * [Auto Parallel] Add removed config * [Auto Parallel] Add imports * [Auto Parallel] Fix bugs for to_static * [Auto Parallel] Remove unnecessary imports 2022-10-12 19:29:06 +08:00			`_g_collections[collection_name].append((name, value))`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00			`else:`
[AutoParallel] add callbacks (#47014) * [AutoParallel] add callbacks * fix unittest * fix dist_context * fix engine * fix cmakelist * fix unittest's returns * fix cmakelist 2022-10-18 10:00:40 +08:00			`for _, v in _g_collections[collection_name]:`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`if v == value:`
			`return`
[Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633) * [Auto Parallel] Unify the logger and outputs of Engine API * [Auto Parallel] Fix the bugs of to_static * [Auto Parallel] Adjust the test_to_static.py 2022-10-10 16:00:10 +08:00			`_g_collections[collection_name].append((None, value))`
[Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> 2022-09-15 20:35:52 +08:00

[Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633) * [Auto Parallel] Unify the logger and outputs of Engine API * [Auto Parallel] Fix the bugs of to_static * [Auto Parallel] Adjust the test_to_static.py 2022-10-10 16:00:10 +08:00			`def fetch(tensor, name=None, logging=False):`
[Auto Parallel] Clear some fluid APIs (#49793) * [Auto Parallel] Rename methods of ProcessMesh * [Auto Parallel] Impl the python process_mesh by the c++ one * [Auto Parallel] Add some minor modifications * [Auto Parallel] Rename some methods * [Auto Parallel] Remove unnecessary codes * [Auto Parallel] Add back some removed files * [Auto Parallel] Fix bugs * [Auto Parallel] Fix a bug * Update process_mesh.cc * [Auto Parallel] Merge dist attrs of Python into C++ * [Auto Parallel] Add back deleted importing * [Auto Parallel] Add back removed unittest * [Auto Parallel] Remove type qualifiers of return types * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix a bug of the quant pass * [Auto Parallel] Fix the code style * [Auto Parallel] Clear some fluid APIs 2023-01-16 10:07:38 +08:00			`if isinstance(tensor, paddle.static.Variable):`
[AutoParallel] fix update complete and add_to_collection (#48943) * [AutoParallel] fix update complete and add_to_collection * fix annotation * fix amp fill_constant dist_attr 2022-12-21 20:10:03 +08:00			`tensor = tensor.name`
			`elif isinstance(tensor, str):`
			`tensor = tensor`
			`else:`
			`raise TypeError(`
[CodeStyle][ruff] fix v0.3.3 UP032 (#63111) 2024-04-01 10:20:33 +08:00			f"Only support fetch `Variable` or `str`[`Variable`'s name], but got `{type(tensor)}`"
[AutoParallel] fix update complete and add_to_collection (#48943) * [AutoParallel] fix update complete and add_to_collection * fix annotation * fix amp fill_constant dist_attr 2022-12-21 20:10:03 +08:00			`)`
[Auto Parallel] Make Engine class callable (#46416) * [Auto Parallel] Imporve the user-defined fetches and logging * [Auto Parallel] Make Engine class callable * [Auto Parallel] Update the data loading of tuner 2022-09-27 16:59:43 +08:00			`add_to_collection(CollectionNames.FETCHES, tensor, name)`
[Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633) * [Auto Parallel] Unify the logger and outputs of Engine API * [Auto Parallel] Fix the bugs of to_static * [Auto Parallel] Adjust the test_to_static.py 2022-10-10 16:00:10 +08:00			`if logging:`
			`add_to_collection(CollectionNames.LOGGING, tensor, name)`
[AutoParallel] add create_mesh api (#58659) * [AutoParallel] add create_mesh api * fix completion and partitioner * revert * format * add ut 2023-11-10 17:04:51 +08:00

			`_g_mesh = None`


[Auto Parallel] Public mesh get/set method. (#69999) 2024-12-06 13:15:43 +08:00			`def get_mesh() -> paddle.distributed.ProcessMesh:`
			`"""`
			`Get the global mesh set by set_mesh.`

			`Returns:`
			`mesh (paddle.distributed.ProcessMesh): the global mesh.`

			`Examples:`
[CodeStyle][DocFormat][139] Use pycon marker in auto parallel docs examples (#78003) 2026-02-20 18:49:20 +08:00			`.. code-block:: pycon`
[Auto Parallel] Public mesh get/set method. (#69999) 2024-12-06 13:15:43 +08:00
			`>>> import paddle`
			`>>> import paddle.distributed as dist`
			`>>> mesh = dist.ProcessMesh([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], dim_names=["dp", "mp", "pp"])`
			`>>> # doctest: +REQUIRES(env:DISTRIBUTED)`
			`>>> dist.auto_parallel.set_mesh(mesh)`
			`>>> mesh = dist.auto_parallel.get_mesh()`
			`>>> # This case need to be executed in multi-card environment`
			`>>> # python -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 {test_case}.py`
			`"""`
[AutoParallel] add create_mesh api (#58659) * [AutoParallel] add create_mesh api * fix completion and partitioner * revert * format * add ut 2023-11-10 17:04:51 +08:00			`global _g_mesh`
			`return _g_mesh`


[Auto Parallel] Public mesh get/set method. (#69999) 2024-12-06 13:15:43 +08:00			`def set_mesh(mesh: paddle.distributed.ProcessMesh) -> None:`
			`"""`
			`Set the global mesh.`

			`Args:`
			`mesh (paddle.distributed.ProcessMesh): global mesh to be set.`

			`Returns:`
			`None`

			`Examples:`
[CodeStyle][DocFormat][139] Use pycon marker in auto parallel docs examples (#78003) 2026-02-20 18:49:20 +08:00			`.. code-block:: pycon`
[Auto Parallel] Public mesh get/set method. (#69999) 2024-12-06 13:15:43 +08:00
			`>>> import paddle`
			`>>> import paddle.distributed as dist`
			`>>> mesh = dist.ProcessMesh([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], dim_names=["dp", "mp", "pp"])`
			`>>> # doctest: +REQUIRES(env:DISTRIBUTED)`
			`>>> dist.auto_parallel.set_mesh(mesh)`
			`>>> # This case need to be executed in multi-card environment`
			`>>> # python -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 {test_case}.py`
			`"""`
[AutoParallel] fix hang when vpp enabled (#62727) * [AutoParallel] fix hang when vpp enabled * fix * add warning log * attention_mask can be None 2024-03-18 12:25:42 +08:00			`global _g_mesh`
			`_g_mesh = mesh`


[Typing][PEP585 Upgrade][BUAA][21-30] Use standard collections for type hints for 9 files in `python/paddle/` (#67119) 2024-08-07 21:27:40 +08:00			`def create_mesh(mesh_dims: list[tuple[str, int]]):`
[AutoParallel] add create_mesh api (#58659) * [AutoParallel] add create_mesh api * fix completion and partitioner * revert * format * add ut 2023-11-10 17:04:51 +08:00			`"""`
			`Create a global process_mesh for auto parallel.`

			`Args:`
			`mesh_dims (list[tuple[str, int]]): A list of tuple, each element is (dim_name, dim_degree).`
			`"""`
			`global _g_mesh`
			`dim_names = [mesh_dim[0] for mesh_dim in mesh_dims]`
			`mesh_shape = [mesh_dim[1] for mesh_dim in mesh_dims]`
			`mesh_arr = np.arange(0, reduce(lambda x, y: x * y, mesh_shape, 1)).reshape(`
			`mesh_shape`
			`)`
			`_g_mesh = ProcessMesh(mesh_arr, dim_names)`
			`return _g_mesh`