Blame: tests/python/test_reduction.py - taichi-dev/taichi

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								import numpy as np

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

								from pytest import approx

							

[Doc] Document lazy_grad() function (#2456) * [doc] Document lazy_grad() function * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-06-23 17:12:25 +08:00

								import taichi as ti

							

[ci] Move _testing.py into tests folder (#4247)

2022-02-10 12:37:36 +08:00

								from tests import test_utils

							

[Doc] Document lazy_grad() function (#2456) * [doc] Document lazy_grad() function * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-06-23 17:12:25 +08:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								OP_ADD = 0

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

								    N = 1024 * 1024

							

[infra] Refactor Vulkan runtime into true Common Runtime (#5058) * Remove all references to Vulkan in common runtime & fix device API for OpenGL (bindings) and DirectX 11 (memory leaks) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cpp test * update * update Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

2022-06-01 01:26:18 -07:00

    if (

[aot] Switch Metal to SPIR-V codegen (#7093) This PR is a pretty huge one: 1. Metal runtime is now written in Objective-C++ instead of wrapped C++. 2. The Metal backend is now driven by `GfxRuntime`. `MetalRuntime` has been removed entirely. 3. The Metal backend is now consuming artifacts from the SPIR-V codegen. SPIRV-Cross is linked to enable translation from SPIR-V to plain text MSL. The Metal codegen has been removed entirely. 4. The runtime C-API now supports Metal. Yay! 5. Adapted tests to be like other gfx archs. Removed sparse taichi extensions. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

2023-01-09 20:47:57 +08:00

								        or ti.lang.impl.current_cfg().arch == ti.metal

							

[opengl] Add ti.gles arch and enable tests (#6988) Issue: # ### Brief Summary

2022-12-27 00:27:09 -08:00

								        or ti.lang.impl.current_cfg().arch == ti.gles

							

[infra] Refactor Vulkan runtime into true Common Runtime (#5058) * Remove all references to Vulkan in common runtime & fix device API for OpenGL (bindings) and DirectX 11 (memory leaks) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cpp test * update * update Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

2022-06-01 01:26:18 -07:00

								        or ti.lang.impl.current_cfg().arch == ti.dx11

							

[vulkan] [test] Support full atomic operations on Vulkan backend (#2709) * add atomics * fix reduction * fix test local atomics * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-08-17 14:49:43 +08:00

        # OpenGL/Vulkan are not capable of such large number in its float32...

[opengl] [test] Fix failed OpenGL tests these days (#1533)

2020-07-18 23:30:13 +08:00

								        N = 1024 * 16

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[test] Replace ti.var by ti.field in tests starting with r-z (#1684)

2020-08-13 11:24:48 +08:00

								    a = ti.field(dtype, shape=N)

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Bug] Warp reduction bug fix (#2519) * bug fix * update tests to cover the bug

2021-07-13 20:12:15 +08:00

								    if dtype in [ti.f32, ti.f64]:

							

[misc] Switch code formatter from `yapf` to `black` (#7785) Issue: # ### Brief Summary

2023-04-16 22:20:35 +08:00

[Bug] Warp reduction bug fix (#2519) * bug fix * update tests to cover the bug

2021-07-13 20:12:15 +08:00

    else:

[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-09-22 17:36:02 +08:00

								                a[i] = i + 1

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								    ti_op = ti_ops[op]

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

    @ti.kernel

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								            ti_op(tot[None], a[i])

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Perf] Support TLS for GlobalTemporaryStmt (#1423)

2020-07-07 02:03:45 +09:00

    @ti.kernel

[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-09-22 17:36:02 +08:00

								        s = ti.zero(tot[None]) if op == OP_ADD or op == OP_XOR else a[0]

							

[Perf] Support TLS for GlobalTemporaryStmt (#1423)

2020-07-07 02:03:45 +09:00

								        for i in a:

							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								            ti_op(s, a[i])

							

[Perf] Support TLS for GlobalTemporaryStmt (#1423)

2020-07-07 02:03:45 +09:00

        return s

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

								    fill()

							

[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-09-22 17:36:02 +08:00

								    tot[None] = 0 if op in [OP_ADD, OP_XOR] else a[0]

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

								    reduce()

							

[Perf] Support TLS for GlobalTemporaryStmt (#1423)

2020-07-07 02:03:45 +09:00

								    tot2 = reduce_tmp()

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-09-22 17:36:02 +08:00

								    np_arr = a.to_numpy()

							

[Bug] Warp reduction bug fix (#2519) * bug fix * update tests to cover the bug

2021-07-13 20:12:15 +08:00

								    ground_truth = np_ops[op](np_arr)

							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

								    assert criterion(tot[None], ground_truth)

							

[Perf] Support TLS for GlobalTemporaryStmt (#1423)

2020-07-07 02:03:45 +09:00

								    assert criterion(tot2, ground_truth)

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								@pytest.mark.parametrize("op", [OP_ADD, OP_MIN, OP_MAX, OP_AND, OP_OR, OP_XOR])

							

[ci] Move _testing.py into tests folder (#4247)

2022-02-10 12:37:36 +08:00

								@test_utils.test()

							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								def test_reduction_single_i32(op):

							

[bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) Numpy 2.0 has removed deprecated implicit cast behavior of overflow casting to smaller dtypes. And it has removed .product in favor of .prod

2024-06-23 19:28:54 -07:00

								    _test_reduction_single(ti.i32, lambda x, y: int(x) % 2**32 == int(y) % 2**32, op)

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								@pytest.mark.parametrize("op", [OP_ADD])

							

[opengl] Add ti.gles arch and enable tests (#6988) Issue: # ### Brief Summary

2022-12-27 00:27:09 -08:00

								@test_utils.test(exclude=[ti.opengl, ti.gles])

							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								def test_reduction_single_u32(op):

							

[bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) Numpy 2.0 has removed deprecated implicit cast behavior of overflow casting to smaller dtypes. And it has removed .product in favor of .prod

2024-06-23 19:28:54 -07:00

								    _test_reduction_single(ti.u32, lambda x, y: int(x) % 2**32 == int(y) % 2**32, op)

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								@pytest.mark.parametrize("op", [OP_ADD, OP_MIN, OP_MAX])

							

[ci] Move _testing.py into tests folder (#4247)

2022-02-10 12:37:36 +08:00

								@test_utils.test()

							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								def test_reduction_single_f32(op):

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								@pytest.mark.parametrize("op", [OP_ADD])

							

[ci] Move _testing.py into tests folder (#4247)

2022-02-10 12:37:36 +08:00

								@test_utils.test(require=ti.extension.data64)

							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								def test_reduction_single_i64(op):

							

[bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) Numpy 2.0 has removed deprecated implicit cast behavior of overflow casting to smaller dtypes. And it has removed .product in favor of .prod

2024-06-23 19:28:54 -07:00

								    _test_reduction_single(ti.i64, lambda x, y: int(x) % 2**64 == int(y) % 2**64, op)

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								@pytest.mark.parametrize("op", [OP_ADD])

							

[opengl] Add ti.gles arch and enable tests (#6988) Issue: # ### Brief Summary

2022-12-27 00:27:09 -08:00

								@test_utils.test(exclude=[ti.opengl, ti.gles], require=ti.extension.data64)

							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								def test_reduction_single_u64(op):

							

[bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) Numpy 2.0 has removed deprecated implicit cast behavior of overflow casting to smaller dtypes. And it has removed .product in favor of .prod

2024-06-23 19:28:54 -07:00

								    _test_reduction_single(ti.u64, lambda x, y: int(x) % 2**64 == int(y) % 2**64, op)

							

[Perf] Thread local storage for range-for reductions on GPUs (#1336)

2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								@pytest.mark.parametrize("op", [OP_ADD])

							

[ci] Move _testing.py into tests folder (#4247)

2022-02-10 12:37:36 +08:00

								@test_utils.test(require=ti.extension.data64)

							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-07-05 15:54:15 +08:00

								def test_reduction_single_f64(op):

							

[opengl] [refactor] Fix TLS not working and refactor ParallelSize for grid-stride-loop (#1600) * add logue * fix continue/return in gsl * refactor * test no mucher * fix non-GL build * [skip ci] enforce code format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2020-07-27 14:59:36 +08:00

[ci] Move _testing.py into tests folder (#4247)

2022-02-10 12:37:36 +08:00

								@test_utils.test()

							

[opengl] [refactor] Fix TLS not working and refactor ParallelSize for grid-stride-loop (#1600) * add logue * fix continue/return in gsl * refactor * test no mucher * fix non-GL build * [skip ci] enforce code format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2020-07-27 14:59:36 +08:00

								def test_reduction_different_scale():

							

[opt] Add conservative alias analysis for ExternalPtrStmt (#2952)

2021-09-17 09:52:06 +08:00

[cuda] Disable reduction in non-full warps (#5161) * disable reduction in non-full warps * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add test case * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

2022-06-14 14:48:34 -07:00

								@test_utils.test()

							

[ci] Move _testing.py into tests folder (#4247)

2022-02-10 12:37:36 +08:00

								@test_utils.test()

							

[refactor] Remove legacy usage of ext_arr/any_arr in codebase (#4698) * [refactor] Remove legacy usage of ext_arr/any_arr in codebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

2022-04-01 17:53:06 +08:00

								def test_reduction_ndarray():

							

[opt] Add conservative alias analysis for ExternalPtrStmt (#2952)

2021-09-17 09:52:06 +08:00

    @ti.kernel

[refactor] Remove legacy usage of ext_arr/any_arr in codebase (#4698) * [refactor] Remove legacy usage of ext_arr/any_arr in codebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

2022-04-01 17:53:06 +08:00

								    def reduce(a: ti.types.ndarray()) -> ti.i32:

							

[opt] Add conservative alias analysis for ExternalPtrStmt (#2952)

2021-09-17 09:52:06 +08:00

								        s = 0

							

[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-09-22 17:36:02 +08:00

								            ti.atomic_sub(s, 2)

							

[opt] Add conservative alias analysis for ExternalPtrStmt (#2952)

2021-09-17 09:52:06 +08:00

        return s

[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com>

2021-09-22 17:36:02 +08:00

								    assert reduce(x) == -n
							

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`import numpy as np`
			`import pytest`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00			`from pytest import approx`

[Doc] Document lazy_grad() function (#2456) * [doc] Document lazy_grad() function * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-06-23 17:12:25 +08:00			`import taichi as ti`
[ci] Move _testing.py into tests folder (#4247) 2022-02-10 12:37:36 +08:00			`from tests import test_utils`
[Doc] Document lazy_grad() function (#2456) * [doc] Document lazy_grad() function * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-06-23 17:12:25 +08:00
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`OP_ADD = 0`
			`OP_MIN = 1`
			`OP_MAX = 2`
			`OP_AND = 3`
			`OP_OR = 4`
			`OP_XOR = 5`

			`ti_ops = {`
			`OP_ADD: ti.atomic_add,`
			`OP_MIN: ti.atomic_min,`
			`OP_MAX: ti.atomic_max,`
			`OP_AND: ti.atomic_and,`
			`OP_OR: ti.atomic_or,`
			`OP_XOR: ti.atomic_xor,`
			`}`

			`np_ops = {`
			`OP_ADD: np.sum,`
			`OP_MIN: lambda a: a.min(),`
			`OP_MAX: lambda a: a.max(),`
			`OP_AND: np.bitwise_and.reduce,`
			`OP_OR: np.bitwise_or.reduce,`
			`OP_XOR: np.bitwise_xor.reduce,`
			`}`


			`def _test_reduction_single(dtype, criterion, op):`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00			`N = 1024 * 1024`
[infra] Refactor Vulkan runtime into true Common Runtime (#5058) * Remove all references to Vulkan in common runtime & fix device API for OpenGL (bindings) and DirectX 11 (memory leaks) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cpp test * update * update Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2022-06-01 01:26:18 -07:00			`if (`
			`ti.lang.impl.current_cfg().arch == ti.opengl`
			`or ti.lang.impl.current_cfg().arch == ti.vulkan`
[aot] Switch Metal to SPIR-V codegen (#7093) This PR is a pretty huge one: 1. Metal runtime is now written in Objective-C++ instead of wrapped C++. 2. The Metal backend is now driven by `GfxRuntime`. `MetalRuntime` has been removed entirely. 3. The Metal backend is now consuming artifacts from the SPIR-V codegen. SPIRV-Cross is linked to enable translation from SPIR-V to plain text MSL. The Metal codegen has been removed entirely. 4. The runtime C-API now supports Metal. Yay! 5. Adapted tests to be like other gfx archs. Removed sparse taichi extensions. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2023-01-09 20:47:57 +08:00			`or ti.lang.impl.current_cfg().arch == ti.metal`
[opengl] Add ti.gles arch and enable tests (#6988) Issue: # ### Brief Summary 2022-12-27 00:27:09 -08:00			`or ti.lang.impl.current_cfg().arch == ti.gles`
[infra] Refactor Vulkan runtime into true Common Runtime (#5058) * Remove all references to Vulkan in common runtime & fix device API for OpenGL (bindings) and DirectX 11 (memory leaks) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cpp test * update * update Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2022-06-01 01:26:18 -07:00			`or ti.lang.impl.current_cfg().arch == ti.dx11`
			`) and dtype == ti.f32:`
[vulkan] [test] Support full atomic operations on Vulkan backend (#2709) * add atomics * fix reduction * fix test local atomics * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-08-17 14:49:43 +08:00			`# OpenGL/Vulkan are not capable of such large number in its float32...`
[opengl] [test] Fix failed OpenGL tests these days (#1533) 2020-07-18 23:30:13 +08:00			`N = 1024 * 16`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00
[test] Replace ti.var by ti.field in tests starting with r-z (#1684) 2020-08-13 11:24:48 +08:00			`a = ti.field(dtype, shape=N)`
			`tot = ti.field(dtype, shape=())`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00
[Bug] Warp reduction bug fix (#2519) * bug fix * update tests to cover the bug 2021-07-13 20:12:15 +08:00			`if dtype in [ti.f32, ti.f64]:`

			`@ti.kernel`
			`def fill():`
			`for i in a:`
			`a[i] = i + 0.5`
[misc] Switch code formatter from `yapf` to `black` (#7785) Issue: # ### Brief Summary 2023-04-16 22:20:35 +08:00
[Bug] Warp reduction bug fix (#2519) * bug fix * update tests to cover the bug 2021-07-13 20:12:15 +08:00			`else:`

			`@ti.kernel`
			`def fill():`
			`for i in a:`
[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-09-22 17:36:02 +08:00			`a[i] = i + 1`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`ti_op = ti_ops[op]`

[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00			`@ti.kernel`
			`def reduce():`
			`for i in a:`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`ti_op(tot[None], a[i])`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00
[Perf] Support TLS for GlobalTemporaryStmt (#1423) 2020-07-07 02:03:45 +09:00			`@ti.kernel`
			`def reduce_tmp() -> dtype:`
[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-09-22 17:36:02 +08:00			`s = ti.zero(tot[None]) if op == OP_ADD or op == OP_XOR else a[0]`
[Perf] Support TLS for GlobalTemporaryStmt (#1423) 2020-07-07 02:03:45 +09:00			`for i in a:`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`ti_op(s, a[i])`
[Perf] Support TLS for GlobalTemporaryStmt (#1423) 2020-07-07 02:03:45 +09:00			`return s`

[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00			`fill()`
[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-09-22 17:36:02 +08:00			`tot[None] = 0 if op in [OP_ADD, OP_XOR] else a[0]`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00			`reduce()`
[Perf] Support TLS for GlobalTemporaryStmt (#1423) 2020-07-07 02:03:45 +09:00			`tot2 = reduce_tmp()`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00
[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-09-22 17:36:02 +08:00			`np_arr = a.to_numpy()`
[Bug] Warp reduction bug fix (#2519) * bug fix * update tests to cover the bug 2021-07-13 20:12:15 +08:00			`ground_truth = np_ops[op](np_arr)`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00			`assert criterion(tot[None], ground_truth)`
[Perf] Support TLS for GlobalTemporaryStmt (#1423) 2020-07-07 02:03:45 +09:00			`assert criterion(tot2, ground_truth)`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`@pytest.mark.parametrize("op", [OP_ADD, OP_MIN, OP_MAX, OP_AND, OP_OR, OP_XOR])`
[ci] Move _testing.py into tests folder (#4247) 2022-02-10 12:37:36 +08:00			`@test_utils.test()`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`def test_reduction_single_i32(op):`
[bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) Numpy 2.0 has removed deprecated implicit cast behavior of overflow casting to smaller dtypes. And it has removed .product in favor of .prod 2024-06-23 19:28:54 -07:00			`_test_reduction_single(ti.i32, lambda x, y: int(x) % 232 == int(y) % 232, op)`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`@pytest.mark.parametrize("op", [OP_ADD])`
[opengl] Add ti.gles arch and enable tests (#6988) Issue: # ### Brief Summary 2022-12-27 00:27:09 -08:00			`@test_utils.test(exclude=[ti.opengl, ti.gles])`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`def test_reduction_single_u32(op):`
[bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) Numpy 2.0 has removed deprecated implicit cast behavior of overflow casting to smaller dtypes. And it has removed .product in favor of .prod 2024-06-23 19:28:54 -07:00			`_test_reduction_single(ti.u32, lambda x, y: int(x) % 232 == int(y) % 232, op)`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`@pytest.mark.parametrize("op", [OP_ADD, OP_MIN, OP_MAX])`
[ci] Move _testing.py into tests folder (#4247) 2022-02-10 12:37:36 +08:00			`@test_utils.test()`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`def test_reduction_single_f32(op):`
			`_test_reduction_single(ti.f32, lambda x, y: x == approx(y, 3e-4), op)`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`@pytest.mark.parametrize("op", [OP_ADD])`
[ci] Move _testing.py into tests folder (#4247) 2022-02-10 12:37:36 +08:00			`@test_utils.test(require=ti.extension.data64)`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`def test_reduction_single_i64(op):`
[bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) Numpy 2.0 has removed deprecated implicit cast behavior of overflow casting to smaller dtypes. And it has removed .product in favor of .prod 2024-06-23 19:28:54 -07:00			`_test_reduction_single(ti.i64, lambda x, y: int(x) % 264 == int(y) % 264, op)`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`@pytest.mark.parametrize("op", [OP_ADD])`
[opengl] Add ti.gles arch and enable tests (#6988) Issue: # ### Brief Summary 2022-12-27 00:27:09 -08:00			`@test_utils.test(exclude=[ti.opengl, ti.gles], require=ti.extension.data64)`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`def test_reduction_single_u64(op):`
[bug] Fixes for numpy 2.0 (unblocking python 3.12 release build on mac) (#8552) Numpy 2.0 has removed deprecated implicit cast behavior of overflow casting to smaller dtypes. And it has removed .product in favor of .prod 2024-06-23 19:28:54 -07:00			`_test_reduction_single(ti.u64, lambda x, y: int(x) % 264 == int(y) % 264, op)`
[Perf] Thread local storage for range-for reductions on GPUs (#1336) 2020-06-26 19:32:21 -04:00

[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`@pytest.mark.parametrize("op", [OP_ADD])`
[ci] Move _testing.py into tests folder (#4247) 2022-02-10 12:37:36 +08:00			`@test_utils.test(require=ti.extension.data64)`
[Perf] [cuda] Use warp reduction to improve reduction performance (#2487) * wip * wip * fix macro * clean up codegen_cuda * remove unused * fix * ran ti format * return old_value instead of 0 * some comments * add tests for other reduction ops * Auto Format * remove default arg for AtomicOpStmt * use look up table in test Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-07-05 15:54:15 +08:00			`def test_reduction_single_f64(op):`
			`_test_reduction_single(ti.f64, lambda x, y: x == approx(y, 1e-12), op)`
[opengl] [refactor] Fix TLS not working and refactor ParallelSize for grid-stride-loop (#1600) * add logue * fix continue/return in gsl * refactor * test no mucher * fix non-GL build * [skip ci] enforce code format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2020-07-27 14:59:36 +08:00

[ci] Move _testing.py into tests folder (#4247) 2022-02-10 12:37:36 +08:00			`@test_utils.test()`
[opengl] [refactor] Fix TLS not working and refactor ParallelSize for grid-stride-loop (#1600) * add logue * fix continue/return in gsl * refactor * test no mucher * fix non-GL build * [skip ci] enforce code format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2020-07-27 14:59:36 +08:00			`def test_reduction_different_scale():`
			`@ti.kernel`
			`def func(n: ti.template()) -> ti.i32:`
			`x = 0`
			`for i in range(n):`
			`ti.atomic_add(x, 1)`
			`return x`

			`# 10 and 60 since OpenGL TLS stride size = 32`
			`# 1024 and 100000 since OpenGL max threads per group ~= 1792`
			`for n in [1, 10, 60, 1024, 100000]:`
			`assert n == func(n)`
[opt] Add conservative alias analysis for ExternalPtrStmt (#2952) 2021-09-17 09:52:06 +08:00

[cuda] Disable reduction in non-full warps (#5161) * disable reduction in non-full warps * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add test case * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2022-06-14 14:48:34 -07:00			`@test_utils.test()`
			`def test_reduction_non_full_warp():`
			`@ti.kernel`
			`def test() -> ti.i32:`
			`hit_time = 1`
			`ti.loop_config(block_dim=8)`
			`for i in range(8):`
			`ti.atomic_min(hit_time, 1)`
			`return hit_time`

			`assert test() == 1`


[ci] Move _testing.py into tests folder (#4247) 2022-02-10 12:37:36 +08:00			`@test_utils.test()`
[refactor] Remove legacy usage of ext_arr/any_arr in codebase (#4698) * [refactor] Remove legacy usage of ext_arr/any_arr in codebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2022-04-01 17:53:06 +08:00			`def test_reduction_ndarray():`
[opt] Add conservative alias analysis for ExternalPtrStmt (#2952) 2021-09-17 09:52:06 +08:00			`@ti.kernel`
[refactor] Remove legacy usage of ext_arr/any_arr in codebase (#4698) * [refactor] Remove legacy usage of ext_arr/any_arr in codebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2022-04-01 17:53:06 +08:00			`def reduce(a: ti.types.ndarray()) -> ti.i32:`
[opt] Add conservative alias analysis for ExternalPtrStmt (#2952) 2021-09-17 09:52:06 +08:00			`s = 0`
			`for i in a:`
			`ti.atomic_add(s, a[i])`
[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-09-22 17:36:02 +08:00			`ti.atomic_sub(s, 2)`
[opt] Add conservative alias analysis for ExternalPtrStmt (#2952) 2021-09-17 09:52:06 +08:00			`return s`

			`n = 1024`
			`x = np.ones(n, dtype=np.int32)`
[opt] Support atomic min/max in warp reduction optimization (#2956) * [opt] Support min/max in warp reduction optimization * Auto Format * Make tests stronger * Unfold unnecessary functions * Fix initial values for min/max * Auto Format * Fix comments * Auto Format Co-authored-by: Taichi Gardener <taichigardener@gmail.com> 2021-09-22 17:36:02 +08:00			`assert reduce(x) == -n`