2017-08-08 16:36:23 -07:00
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
/*!
* \file test_util.h
* \brief unit test performance analysis functions
* \author Chris Olivier
2021-11-19 09:27:00 +01:00
*/
2017-07-12 10:04:40 -07:00
# ifndef TEST_UTIL_H_
# define TEST_UTIL_H_
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
# include <gtest/gtest.h>
# include <mxnet/storage.h>
2017-09-13 12:34:48 -07:00
# include <mxnet/ndarray.h>
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
# include <string>
# include <vector>
# include <sstream>
2017-12-14 17:25:26 +01:00
# include <random>
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
2018-02-15 14:44:34 -08:00
# include "../../../src/ndarray/ndarray_function.h"
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
# if MXNET_USE_VTUNE
# include <ittnotify.h>
# endif
namespace mxnet {
namespace test {
extern bool unitTestsWithCuda ;
CPU optimization for ActivationOp (#8296)
* CPU optimization for ActivationOp
Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass).
Very slight improvement on GPU.
OLD MSHADOW APPROACH
--------------------
CPU
===
Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes
Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes
Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes
Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes
Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes
Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes
GPU
===
Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes
Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes
Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes
NEW MXNET_OP APPROACH
---------------------
CPU
===
Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes
Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes
Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes
Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes
Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes
Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes
GPU
===
Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes
Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes
* lint
* Trigger build
* Trigger build
* Negative begin and end support for csr slice (#8241)
* negative index support for sparse slice
* fix lint
* getitem(int) for csr ndarray, support a[-1]
* remove unneccessary argument
* unittest and doc update
* Preparing for 0.12.0.rc0: Final changes before RC (#8301)
* Final changes before RC
* Updates to NEWS.md
* Updates
* Enable smoothing in softmax operator (#8125)
* v0.12 regression: Fix registration of children for Block (#8277)
* Fix Block not registering children
If the attribute was already set to something different than Block (e.g. None),
it was not being registered.
* fix if / elif for block children registration
* trigger test
* Add fix from #8152
* Add tests from #8152
* Revert "[CMAKE] Fix windows cmake build" (#8311)
* Revert "Added my code signing key (#8293)"
This reverts commit 22ab185bbfde0ac2d801ec700ac4705ef0ee8daa.
* Revert "[CMAKE] Fix windows cmake build (#8227)"
This reverts commit 1c1c788916d672ee3cafdc4c91d7002a94a59d13.
* fixed broken links. https was pointing to http for mxnet.io (#8300)
* Update rnn.md (#8320)
* fluent methods for missed ops (#8329)
* update ps lite (#8327)
* Fix unused type warning (#8316)
* Trigger build
* Trigger build
* Misc fixes for sparse distributed training (#8345)
* remove mshadow::range in init_op.h
* add unit test
* remove pass by ptr, add unit test for pull empty wieghts
* fix range in key partition
* remove wrong comment
* remove change for partition
* remove unused var
* add int64 to arange. add checkpointing example
* Fix the Readme (#8369)
* Allow test to converge (#8351)
* Allow test to converge
* Trigger build
* Trigger build
* Trigger build
* Update cudnn_algoreg-inl.h (#7988)
* [Perl] emulate Python zip() for Perl (#8192)
* [Perl] emulate Python zip() for Perl
* [Perl] retool zip() uses away from the callback form
* add profile option for frontend profiling to image script (#8171)
* add profile option for frontend profiling to image script
* Update image_classification.py
* Update image_classification.py
* Fix Typo (classification) (#8376)
Fix a typo in the example readme.
2017-10-22 20:41:14 -07:00
extern bool debug_output ;
2017-09-13 12:34:48 -07:00
extern bool quick_test ;
CPU optimization for ActivationOp (#8296)
* CPU optimization for ActivationOp
Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass).
Very slight improvement on GPU.
OLD MSHADOW APPROACH
--------------------
CPU
===
Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes
Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes
Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes
Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes
Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes
Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes
GPU
===
Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes
Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes
Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes
NEW MXNET_OP APPROACH
---------------------
CPU
===
Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes
Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes
Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes
Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes
Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes
Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes
GPU
===
Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes
Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes
Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes
Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes
* lint
* Trigger build
* Trigger build
* Negative begin and end support for csr slice (#8241)
* negative index support for sparse slice
* fix lint
* getitem(int) for csr ndarray, support a[-1]
* remove unneccessary argument
* unittest and doc update
* Preparing for 0.12.0.rc0: Final changes before RC (#8301)
* Final changes before RC
* Updates to NEWS.md
* Updates
* Enable smoothing in softmax operator (#8125)
* v0.12 regression: Fix registration of children for Block (#8277)
* Fix Block not registering children
If the attribute was already set to something different than Block (e.g. None),
it was not being registered.
* fix if / elif for block children registration
* trigger test
* Add fix from #8152
* Add tests from #8152
* Revert "[CMAKE] Fix windows cmake build" (#8311)
* Revert "Added my code signing key (#8293)"
This reverts commit 22ab185bbfde0ac2d801ec700ac4705ef0ee8daa.
* Revert "[CMAKE] Fix windows cmake build (#8227)"
This reverts commit 1c1c788916d672ee3cafdc4c91d7002a94a59d13.
* fixed broken links. https was pointing to http for mxnet.io (#8300)
* Update rnn.md (#8320)
* fluent methods for missed ops (#8329)
* update ps lite (#8327)
* Fix unused type warning (#8316)
* Trigger build
* Trigger build
* Misc fixes for sparse distributed training (#8345)
* remove mshadow::range in init_op.h
* add unit test
* remove pass by ptr, add unit test for pull empty wieghts
* fix range in key partition
* remove wrong comment
* remove change for partition
* remove unused var
* add int64 to arange. add checkpointing example
* Fix the Readme (#8369)
* Allow test to converge (#8351)
* Allow test to converge
* Trigger build
* Trigger build
* Trigger build
* Update cudnn_algoreg-inl.h (#7988)
* [Perl] emulate Python zip() for Perl (#8192)
* [Perl] emulate Python zip() for Perl
* [Perl] retool zip() uses away from the callback form
* add profile option for frontend profiling to image script (#8171)
* add profile option for frontend profiling to image script
* Update image_classification.py
* Update image_classification.py
* Fix Typo (classification) (#8376)
Fix a typo in the example readme.
2017-10-22 20:41:14 -07:00
extern bool performance_run ;
2017-11-21 06:49:51 -08:00
extern bool csv ;
Multithreaded Inference Support (#16654)
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Add CI changes
* Add stage
Fix indentation
* Fix lint
* Change to DEFAULT for C API
* Fix mxnet_unit_tests path
* export correct LD_LIBRARY_PATH
* Add cpp include dirs
* Build test with USE_CPP_PACKAGE
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Merge
* change mkldnn lib name
* Add static_alloc, static_Shape support
* Address review comments
* Make GetCachedOpThreadSafeState similar to cached_op
* Address review comments: comments for locking strategy
* multithreaded inference tutorial
* [Estimator] handle composite metrics in estimator (#16676)
* handle composite metrics in estimator
* fix composite metric case in handlers
* remove unused import
* [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678)
* refactor estimator to allow overriding evaluate/fit of a batch
* add doc to explain call structure and how to override
* fix and doc
* Pointwise fusion for GPU (#15167)
* Beginning of RTC of pointwise ops
* Code generation from the given JSON
* add initial simple_partition_pass and use it for pointwise fusion
* fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code
* Fixes
* Adding support for attribute inference for backward nodes when fusing
* keep proper input ordering for fused Op
* instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph
* Fuse backward
* fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch
* excluse forward node fusion during the fusion of the nodes in the backward graph
* Dealing with fused backward nodes inferattr
* use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph
* Adding support for other reqs in codegen
* Fix
* Cleaning
* Change the TVM submodule
* More cleaning
* Making linter happy
* Do fusion only if default context is GPU
* Fixes for tests
Add powerscalar and rpowerscalar, fix return type of zero and one
Cleaning, fixing lint
Go back to proper TVM submodule
* Fix the TVM commit
* Fix lint
* Guard fusion with MXNET_USE_CUDA
* Fix
* Fix clang-tidy
* Add erf and erfinv backward
* Gluon support for fusion
* Cleaning
* Cleaning and allow shape/type change in FusedOp
* Fixing Gluon bugs
* Fixing after rebase
* Fixing race condition and guarding against races when using NVRTC
* Cleaning and renaming FusedOp to _FusedOp
* Going easy on Windows compiler
* Disable fusion on Windows for now
* Refactor InferAttr and InferShapeAttr
* Added slice and half2 support to FusedOp
* Fix lint errors
* Added multiple types support for vector loading/storing
* add slice fusion when it's at the beginning of subgraphs
* Removed constant ndim assumption in fused op
* Fix memory alignment issue in slice for FusedOp
* Fixes
* Fix lint errors
* Do not include cuda_fp16.h
* Refactor fused op op lists
* Make linter happy
* Changes from review
* Fixes after rebase
* Expand FusedOp support for slice
* Fix for fp16 _zeros and _ones
* Fix
* Moving aux functions to unnamed namespace and detail namespace -> fusion
namespace
* Disabling fusion if it alters topological order of inputs
* Print code only when env variable is set
* Fix
* Fix lint and 2 tests that specify the same names for multiple inputs
* Fixes from review and disabling fusion of slice with non-default step
* Add amp_cast to fusion, fixes
* Add amp_multicast and its backward to the list of support ops
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Make clearer comment
* Adding punctuation and capitalization to \brief descriptions
* Fix
* Fix
* Add backward_cast to fusion
* Adding unittests for fusion. Fix for erfinv_grad
* Adding slice ops and add_n to tests
* Fixes from review
* Setting inplace option
* Fix lint
* Storing double in half
* Retrigger CI
* Slight relaxing of the relative tolerance in the test
* Move the env variable check to the end
* Fix a race condition between InferShape and scheduled Forward
* Fix flakey test_fusion test involving fp32 erfinv op.
* Fix from review
* Added broadcast_like and slice_like to fused op
* Minor fix and cleanup
* Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like
* Added axes support to slice_like
* Added axis support to broadcast_like
* Add fast_load_slice function to fused op code
* Added runtime switch for choosing fast and slow slice kernel
* Fix lint and warning
* Going easy on Windows compiler (again)
* Fix slice_like
* Debug broadcast_like fusion
* Fix lint
* Fix lint
* Trigger CI
* Get rid of the initializer list
* Fix backward calls with different gradient type
* avoid cycle when adding node specific for inputs of subgraph for pointwise fusion
* Fix lint
* Add namespace to the fusion implementations
* Set launch bounds on the fused kernel
* Fix NumPy tests
* Test showcasing an issue fixed in PR #16553
* Cast scalarts to FP32 and perform (a*1.0/b) instead of (a/b)
Fix lint errors
Fix lint
* Fix a bug in cycle detection for inputs only op in pointwise fusion
* Add comments to simple_partition_pass.h file
* fix install dir (#16690)
* [numpy] add numpy operator : append (#16564)
* add operator : append ; fix op concatenate when axis = None
* pylint disable
remove mistake
disable pylint
* Initializer.__eq__ (#16680)
* fix binary dependencies in CD and nightly (#16693)
* [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688)
* add mxnet mkldnn cmake instruction
* imporve doc
* OMP->OpenMP
* Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697)
This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa.
* [Estimator] refactor estimator and clarify docs (#16694)
* refactor estimator and clarify docs
* fix info message and test
* clean up after releasing logging handler
* Eliminate common expressions (#15657)
* Eliminate common expressions from a graph
* Guarding against optimizing out stateful ops and ops that require
resource
* Fix lint
* Added THasDeterministicOutput to multiple ops
* DDebug eliminate common expr
* Added test
* Expose get_optimized_symbol
* Fix
* Fix 2
* Add doc to the Python call
* Add env var MXNET_ELIMINATE_COMMON_EXPR, default true
* Add comments, improve readability of eliminate_common_expr_pass.cc
* Expand testing
* Lower priority of THasDeterministicOutput attr for equal Node test
* Change mx.gpu() to mx.cpu() in tests
* Skip CSE test on Windows (as env variable setting during test does not work there)
* Add missing import sys
* Add missing import logging
* Backport of #16711, #16737, #16408 to 1.6 branch (#16763)
* support mixed-precision true_divide (#16711)
* [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737)
* use dim_t instead of int
* fix same issue in pooling
* rebase code
* trigger CI
* Add MXNet Ops for fast multihead attention (#16408)
* add MXNet Ops for fast multihead attention
* add cutlass as 3rdparty dependency
* add cutlass to compilation flags
* remove all cutlass stuff
* add better error message and description and remove cutlass from compilation flags
* change credit for the approach since the code have changed
* fix typos
* correct another typo
* Add all the cuda/cublas helper functions
* remove tests using kAddTo
* only use cublasStridedBatchedGemm if CUDA >= 9.1
* add equivalent mxnet code in description of mha ops
* remove a wrong copy-paste
* add _contrib for namespace and add GPU only on description
* add warning in bwd_ignore_zero_init description, also test with fp32
* add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO
* remove std::move for clang
* remove bwd_ignore_zero_init flag
* remove bwd_ignore_zero_init in test_operator_gpu.py
* fix typo
* fix another typo
* Removed unrelated test
* Add example and documentation for multi threaded inference
* Add LICENSE
* Add get_model.py
* Add license for README
* Refactor cached op and cached op threadsafe
* Add limitation
* Add tests for naive engine
* Add latest test changes
* Thread Safety tests in NaiveEngine mode
* Thread Safety tests update
* Update thread safety tests, add unsupported use cases
* Changes to doc and refactor
* Fix todo owner, indentation and mx_float->float
* Refactor cached op code, remove num_threads arg from example
* Fix lint
* Fix warning
* Add back cython, required for unix-gpu build
* Fix for windows
* Add bulking support for thread safe cached op version
* Add support for subgraph testing
* import mxnet before calling get_backend_symbol
* Fix symbol json name
* Refactor DynamicForward
* Add comments
* Add DMLC_ATTRIBUTE_UNUSED
* Fix use_naive_run issue
* Fix lint
* Revert unittest_cpp to old test since it doesnt test thread safety
* Fix doc
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: Tao Lv <tao.a.lv@intel.com>
Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Zhennan Qin <zhennan.qin@intel.com>
2020-02-01 09:36:59 -08:00
extern bool thread_safety_force_cpu ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
2021-11-19 09:27:00 +01:00
template < typename DType >
2019-02-28 17:41:39 -08:00
inline size_t shapeMemorySize ( const mxnet : : TShape & shape ) {
2017-10-15 13:34:21 -07:00
return shape . Size ( ) * sizeof ( DType ) ;
}
class BlobMemory {
public :
explicit inline BlobMemory ( const bool isGPU ) : isGPU_ ( isGPU ) {
this - > handle_ . dptr = nullptr ;
}
inline ~ BlobMemory ( ) {
Free ( ) ;
}
2021-11-19 09:27:00 +01:00
void * Alloc ( const size_t size ) {
2017-10-15 13:34:21 -07:00
CHECK_GT ( size , 0U ) ; // You've probably made a mistake
mxnet : : Context context = isGPU_ ? mxnet : : Context : : GPU ( 0 ) : mxnet : : Context { } ;
2021-11-19 09:27:00 +01:00
Storage * storage = mxnet : : Storage : : Get ( ) ;
handle_ = storage - > Alloc ( size , context ) ;
2017-10-15 13:34:21 -07:00
return handle_ . dptr ;
}
void Free ( ) {
2019-03-27 19:40:30 -07:00
mxnet : : Storage : : Get ( ) - > DirectFree ( handle_ ) ;
handle_ . dptr = nullptr ;
handle_ . size = 0 ;
2017-10-15 13:34:21 -07:00
}
size_t Size ( ) const {
return handle_ . size ;
}
private :
2021-11-19 09:27:00 +01:00
const bool isGPU_ ;
2017-10-15 13:34:21 -07:00
Storage : : Handle handle_ ;
} ;
class StandaloneBlob : public TBlob {
public :
2019-02-28 17:41:39 -08:00
inline StandaloneBlob ( const mxnet : : TShape & shape , const bool isGPU , const int dtype )
2021-11-19 09:27:00 +01:00
: TBlob ( nullptr , shape , isGPU ? gpu : : kDevMask : cpu : : kDevMask , dtype ) ,
memory_ ( std : : make_shared < BlobMemory > ( isGPU ) ) {
MSHADOW_TYPE_SWITCH (
dtype , DType , { this - > dptr_ = memory_ - > Alloc ( shapeMemorySize < DType > ( shape ) ) ; } ) ;
2017-10-15 13:34:21 -07:00
}
inline ~ StandaloneBlob ( ) {
this - > dptr_ = nullptr ;
}
inline size_t MemorySize ( ) const {
return memory_ - > Size ( ) ;
}
private :
/*! \brief Locally allocated memory block for this blob */
2021-11-19 09:27:00 +01:00
std : : shared_ptr < BlobMemory > memory_ ;
2017-10-15 13:34:21 -07:00
} ;
2018-02-15 14:44:34 -08:00
/*!
* \brief Access a TBlob's data on the CPU within the scope of this object
* Overloaded () operator returns the CPU-bound TBlob
* RAII will copy the data back to the GPU (if it was a GPU blob)
*/
class CAccessAsCPU {
public :
CAccessAsCPU ( const RunContext & run_ctx , const TBlob & src , bool copy_back_result = true )
2021-11-19 09:27:00 +01:00
: run_ctx_ ( run_ctx ) , src_ ( src ) , copy_back_result_ ( copy_back_result ) {
2018-02-15 14:44:34 -08:00
# if MXNET_USE_CUDA
if ( run_ctx . ctx . dev_type = = Context : : kCPU ) {
blob_ = src ;
} else {
Context cpu_ctx , gpu_ctx = run_ctx . ctx ;
cpu_ctx . dev_type = Context : : kCPU ;
2021-11-19 09:27:00 +01:00
cpu_ctx . dev_id = 0 ;
2018-02-15 14:44:34 -08:00
NDArray on_cpu ( src . shape_ , cpu_ctx , false , src_ . type_flag_ ) ;
on_cpu . CheckAndAlloc ( ) ;
blob_ = on_cpu . data ( ) ;
run_ctx . get_stream < gpu > ( ) - > Wait ( ) ;
mxnet : : ndarray : : Copy < gpu , cpu > ( src , & blob_ , cpu_ctx , gpu_ctx , run_ctx ) ;
run_ctx . get_stream < gpu > ( ) - > Wait ( ) ;
on_cpu_ = on_cpu ;
}
# else
blob_ = src ;
# endif
}
~ CAccessAsCPU ( ) {
# if MXNET_USE_CUDA
if ( copy_back_result_ ) {
// Copy back from GPU to CPU
if ( run_ctx_ . ctx . dev_type = = Context : : kGPU ) {
Context cpu_ctx , gpu_ctx = run_ctx_ . ctx ;
cpu_ctx . dev_type = Context : : kCPU ;
2021-11-19 09:27:00 +01:00
cpu_ctx . dev_id = 0 ;
2018-02-15 14:44:34 -08:00
run_ctx_ . get_stream < gpu > ( ) - > Wait ( ) ;
mxnet : : ndarray : : Copy < cpu , gpu > ( blob_ , & src_ , gpu_ctx , cpu_ctx , run_ctx_ ) ;
run_ctx_ . get_stream < gpu > ( ) - > Wait ( ) ;
}
}
# endif
}
2021-11-19 09:27:00 +01:00
inline const TBlob & operator ( ) ( ) const {
2018-02-15 14:44:34 -08:00
return blob_ ;
}
private :
const RunContext run_ctx_ ;
TBlob src_ ;
const bool copy_back_result_ ;
NDArray on_cpu_ ;
TBlob blob_ ;
} ;
/*!
* \brief Access data blob as if on the CPU via a callback
* \tparam Type of callback Function to call with CPU-data NDArray
* \param src Source NDArray (on GPU or CPU)
* \param run_ctx Run context
* \param cb Callback Function to call with CPU-data NDArray
*/
template < typename CallbackFunction >
2021-11-19 09:27:00 +01:00
inline void AccessAsCPU ( const NDArray & src , const RunContext & run_ctx , CallbackFunction cb ) {
2017-10-15 13:34:21 -07:00
# if MXNET_USE_CUDA
2018-02-15 14:44:34 -08:00
if ( src . ctx ( ) . dev_type = = Context : : kCPU ) {
cb ( src ) ;
2017-10-15 13:34:21 -07:00
} else {
2018-02-15 14:44:34 -08:00
Context cpu_ctx , gpu_ctx = src . ctx ( ) ;
cpu_ctx . dev_type = Context : : kCPU ;
2021-11-19 09:27:00 +01:00
cpu_ctx . dev_id = 0 ;
2018-02-15 14:44:34 -08:00
NDArray on_cpu ( src . shape ( ) , cpu_ctx , false , src . dtype ( ) ) ;
on_cpu . CheckAndAlloc ( ) ;
TBlob tmp1 = on_cpu . data ( ) ;
run_ctx . get_stream < gpu > ( ) - > Wait ( ) ;
mxnet : : ndarray : : Copy < gpu , cpu > ( src . data ( ) , & tmp1 , cpu_ctx , gpu_ctx , run_ctx ) ;
run_ctx . get_stream < gpu > ( ) - > Wait ( ) ;
cb ( on_cpu ) ;
TBlob tmp2 = src . data ( ) ;
mxnet : : ndarray : : Copy < cpu , gpu > ( on_cpu . data ( ) , & tmp2 , gpu_ctx , cpu_ctx , run_ctx ) ;
run_ctx . get_stream < gpu > ( ) - > Wait ( ) ;
2017-10-15 13:34:21 -07:00
}
2018-02-15 14:44:34 -08:00
# else
cb ( src ) ;
# endif
2017-10-15 13:34:21 -07:00
}
2018-02-15 14:44:34 -08:00
/*!
* \brief Access data blob as if on the CPU via a callback
* \tparam Type of callback Function to call with CPU-data NDArray
* \param src Source TBlob (on GPU or CPU)
* \param run_ctx Run context
* \param cb Callback Function to call with CPU-data TBlob
*/
template < typename CallbackFunction >
2021-11-19 09:27:00 +01:00
inline void AccessAsCPU ( const TBlob & src , const RunContext & run_ctx , CallbackFunction cb ) {
2018-02-15 14:44:34 -08:00
# if MXNET_USE_CUDA
if ( run_ctx . ctx . dev_type = = Context : : kCPU ) {
cb ( src ) ;
} else {
cb ( CAccessAsCPU ( run_ctx , src , true ) ( ) ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
2018-02-15 14:44:34 -08:00
# else
cb ( src ) ;
# endif
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
2018-02-15 14:44:34 -08:00
constexpr const size_t MPRINT_PRECISION = 5 ;
2021-11-19 09:27:00 +01:00
template < typename DType >
inline void fill ( const RunContext & run_ctx , const TBlob & _blob , const DType val ) {
2020-03-17 21:36:50 -07:00
AccessAsCPU ( _blob , run_ctx , [ val ] ( const TBlob & blob ) {
2018-02-15 14:44:34 -08:00
MSHADOW_TYPE_SWITCH ( blob . type_flag_ , DTypeX , {
2021-11-19 09:27:00 +01:00
DTypeX * p1 = blob . dptr < DTypeX > ( ) ;
2018-02-15 14:44:34 -08:00
for ( size_t i = 0 , n = blob . Size ( ) ; i < n ; + + i ) {
* p1 + + = val ;
}
} ) ;
} ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
2021-11-19 09:27:00 +01:00
template < typename DType >
inline void try_fill ( const RunContext & run_ctx , const TBlob * blob , const DType val ) {
2018-02-15 14:44:34 -08:00
if ( blob ) {
fill ( run_ctx , * blob , val ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
}
2021-11-19 09:27:00 +01:00
template < typename DType , typename Stream >
inline void dump ( Stream * os , const TBlob & blob , const char * suffix = " f " ) {
DType * p1 = blob . dptr < DType > ( ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
for ( size_t i = 0 , n = blob . Size ( ) ; i < n ; + + i ) {
if ( i ) {
* os < < " , " ;
}
const DType val = * p1 + + ;
std : : stringstream stream ;
stream < < val ;
std : : string ss = stream . str ( ) ;
if ( suffix & & * suffix = = ' f ' ) {
if ( std : : find ( ss . begin ( ) , ss . end ( ) , ' . ' ) = = ss . end ( ) ) {
ss + = " .0 " ;
}
}
* os < < ss < < suffix ;
}
}
/*! \brief Return reference to data at position indexes */
2019-02-28 17:41:39 -08:00
inline index_t getMult ( const mxnet : : TShape & shape , const index_t axis ) {
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
return axis < shape . ndim ( ) ? shape [ axis ] : 1 ;
}
/*! \brief offset, given indices such as bn, channel, depth, row, column */
2019-02-28 17:41:39 -08:00
inline index_t offset ( const mxnet : : TShape & shape , const std : : vector < size_t > & indices ) {
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
const size_t dim = shape . ndim ( ) ;
CHECK_LE ( indices . size ( ) , dim ) ;
size_t offset = 0 ;
for ( size_t i = 0 ; i < dim ; + + i ) {
offset * = shape [ i ] ;
if ( indices . size ( ) > i ) {
CHECK_LT ( indices [ i ] , shape [ i ] ) ;
offset + = indices [ i ] ;
}
}
return offset ;
}
/*! \brief Return reference to data at position indexes */
2021-11-19 09:27:00 +01:00
template < typename DType >
inline const DType & data_at ( const TBlob * blob , const std : : vector < size_t > & indices ) {
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
return blob - > dptr < DType > ( ) [ offset ( blob - > shape_ , indices ) ] ;
}
/*! \brief Set data at position indexes */
2021-11-19 09:27:00 +01:00
template < typename DType >
inline DType & data_ref ( const TBlob * blob , const std : : vector < size_t > & indices ) {
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
return blob - > dptr < DType > ( ) [ offset ( blob - > shape_ , indices ) ] ;
}
2021-11-19 09:27:00 +01:00
inline std : : string repeatedStr ( const char * s ,
const signed int count ,
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
const bool trailSpace = false ) {
if ( count < = 0 ) {
return std : : string ( ) ;
} else if ( count = = 1 ) {
std : : stringstream str ;
str < < s < < " " ;
return str . str ( ) ;
} else {
std : : stringstream str ;
for ( int x = 0 ; x < count ; + + x ) {
str < < s ;
}
if ( trailSpace ) {
str < < " " ;
}
return str . str ( ) ;
}
}
2017-09-13 12:34:48 -07:00
/*! \brief Pretty print a shape with optional label */
2021-11-19 09:27:00 +01:00
template < typename StreamType >
inline StreamType & print_shape ( StreamType * _os ,
const std : : string & label ,
const mxnet : : TShape & shape ,
const bool add_endl = true ) {
2017-09-13 12:34:48 -07:00
if ( ! label . empty ( ) ) {
* _os < < label < < " : " ;
}
* _os < < " ( " ;
for ( size_t i = 0 , n = shape . ndim ( ) ; i < n ; + + i ) {
if ( i ) {
* _os < < " , " ;
}
* _os < < shape [ i ] ;
}
* _os < < " ) " ;
if ( add_endl ) {
* _os < < std : : endl ;
} else {
* _os < < " " ;
}
return * _os < < std : : flush ;
}
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
/*! \brief Pretty print a 1D, 2D, or 3D blob */
2021-11-19 09:27:00 +01:00
template < typename DType , typename StreamType >
2017-10-15 13:34:21 -07:00
inline StreamType & print_blob_ ( const RunContext & ctx ,
2021-11-19 09:27:00 +01:00
StreamType * _os ,
const TBlob & blob ,
2017-09-13 12:34:48 -07:00
const bool doChannels = true ,
2021-11-19 09:27:00 +01:00
const bool doBatches = true ,
const bool add_endl = true ) {
2017-10-15 13:34:21 -07:00
# if MXNET_USE_CUDA
if ( blob . dev_mask ( ) = = gpu : : kDevMask ) {
2021-11-19 09:27:00 +01:00
return print_blob_ < DType > (
ctx , _os , CAccessAsCPU ( ctx , blob , false ) ( ) , doChannels , doBatches , add_endl ) ;
2017-10-15 13:34:21 -07:00
}
# endif // MXNET_USE_CUDA
2021-11-19 09:27:00 +01:00
StreamType & os = * _os ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
const size_t dim = static_cast < size_t > ( blob . ndim ( ) ) ;
if ( dim = = 1 ) {
2017-09-13 12:34:48 -07:00
// probably a 1d tensor (mshadow::Tensor is deprecated)
2019-04-16 10:00:54 -07:00
TBlob changed ( blob . dptr < DType > ( ) , mxnet : : TShape ( 3 , - 1 ) , blob . dev_mask ( ) , blob . dev_id ( ) ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
changed . shape_ [ 0 ] = 1 ;
changed . shape_ [ 1 ] = 1 ;
changed . shape_ [ 2 ] = blob . shape_ [ 0 ] ;
2017-10-15 13:34:21 -07:00
return print_blob_ < DType > ( ctx , & os , changed , false , false , add_endl ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
} else if ( dim = = 2 ) {
2017-09-13 12:34:48 -07:00
// probably a 2d tensor (mshadow::Tensor is deprecated)
2019-04-16 10:00:54 -07:00
TBlob changed ( blob . dptr < DType > ( ) , mxnet : : TShape ( 4 , - 1 ) , blob . dev_mask ( ) , blob . dev_id ( ) ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
changed . shape_ [ 0 ] = 1 ;
changed . shape_ [ 1 ] = 1 ;
changed . shape_ [ 2 ] = blob . shape_ [ 0 ] ;
changed . shape_ [ 3 ] = blob . shape_ [ 1 ] ;
2017-10-15 13:34:21 -07:00
return print_blob_ < DType > ( ctx , & os , changed , false , false , add_endl ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
CHECK_GE ( dim , 3U ) < < " Invalid dimension zero (0) " ;
const size_t batchSize = blob . size ( 0 ) ;
size_t channels = 1 ;
2021-11-19 09:27:00 +01:00
size_t depth = 1 ;
size_t height = 1 ;
size_t width = 1 ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
if ( dim > 1 ) {
channels = blob . size ( 1 ) ;
if ( dim > 2 ) {
if ( dim = = 3 ) {
width = blob . size ( 2 ) ;
} else if ( dim = = 4 ) {
height = blob . size ( 2 ) ;
2021-11-19 09:27:00 +01:00
width = blob . size ( 3 ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
} else {
depth = blob . size ( 2 ) ;
if ( dim > 3 ) {
height = blob . size ( 3 ) ;
if ( dim > 4 ) {
width = blob . size ( 4 ) ;
}
}
}
}
}
for ( size_t r = 0 ; r < height ; + + r ) {
for ( size_t thisBatch = 0 ; thisBatch < batchSize ; + + thisBatch ) {
if ( doBatches ) {
std : : stringstream ss ;
if ( doBatches & & ! thisBatch ) {
os < < " | " ;
}
ss < < " N " < < thisBatch < < " | " ;
const std : : string nns = ss . str ( ) ;
if ( ! r ) {
os < < nns ;
} else {
os < < repeatedStr ( " " , nns . size ( ) ) ;
}
}
for ( size_t thisChannel = 0 ; thisChannel < channels ; + + thisChannel ) {
2017-09-13 12:34:48 -07:00
os < < " [ " ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
for ( size_t c = 0 ; c < width ; + + c ) {
if ( c ) {
os < < " , " ;
}
for ( size_t dd = 0 ; dd < depth ; + + dd ) {
DType val ;
switch ( dim ) {
case 3 :
2017-09-13 12:34:48 -07:00
val = data_at < DType > ( & blob , { thisBatch , thisChannel , c } ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
break ;
case 4 :
val = data_at < DType > ( & blob , { thisBatch , thisChannel , r , c } ) ;
break ;
case 5 :
val = data_at < DType > ( & blob , { thisBatch , thisChannel , dd , r , c } ) ;
break ;
default :
LOG ( FATAL ) < < " Unsupported blob dimension " < < dim ;
val = DType ( 0 ) ;
break ;
}
os < < repeatedStr ( " ( " , dd ) ;
2021-11-19 09:27:00 +01:00
os < < std : : fixed < < std : : setw ( 7 ) < < std : : setprecision ( MPRINT_PRECISION ) < < std : : right
< < val < < " " ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
os < < repeatedStr ( " ) " , dd , true ) ;
}
}
os < < " ] " ;
if ( ! doChannels ) {
break ;
}
}
if ( ! doBatches ) {
break ;
} else {
2021-11-19 09:27:00 +01:00
os < < " | " < < std : : flush ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
}
2017-09-13 12:34:48 -07:00
if ( r < height - 1 ) {
os < < std : : endl ;
}
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
2017-09-13 12:34:48 -07:00
if ( ! height ) {
os < < " [] " ;
if ( add_endl ) {
os < < std : : endl ;
}
2018-02-15 14:44:34 -08:00
} else if ( ! add_endl ) {
2017-09-13 12:34:48 -07:00
os < < " " ;
2018-02-15 14:44:34 -08:00
} else {
os < < std : : endl ;
2017-09-13 12:34:48 -07:00
}
os < < std : : flush ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
return os ;
}
2021-11-19 09:27:00 +01:00
template < typename StreamType >
2017-10-15 13:34:21 -07:00
inline StreamType & print ( const RunContext & ctx ,
2021-11-19 09:27:00 +01:00
StreamType * _os ,
const TBlob & blob ,
2017-09-13 12:34:48 -07:00
const bool doChannels = true ,
2021-11-19 09:27:00 +01:00
const bool doBatches = true ,
const bool add_endl = true ) {
2017-09-13 12:34:48 -07:00
MSHADOW_TYPE_SWITCH ( blob . type_flag_ , DType , {
2017-10-15 13:34:21 -07:00
print_blob_ < DType > ( ctx , _os , blob , doChannels , doBatches , add_endl ) ;
2017-09-13 12:34:48 -07:00
} ) ;
return * _os ;
}
2021-11-19 09:27:00 +01:00
template < typename StreamType >
inline StreamType & print ( const RunContext & ctx ,
StreamType * _os ,
const std : : string & label ,
const TBlob & blob ,
2017-09-13 12:34:48 -07:00
const bool doChannels = true ,
2021-11-19 09:27:00 +01:00
bool doBatches = true ,
const bool add_endl = true ) {
2017-09-13 12:34:48 -07:00
if ( ! label . empty ( ) ) {
* _os < < label < < " : " ;
}
2017-10-15 13:34:21 -07:00
return print ( ctx , _os , blob , doChannels , doBatches , add_endl ) ;
2017-09-13 12:34:48 -07:00
}
2021-11-19 09:27:00 +01:00
template < typename StreamType >
inline StreamType & print ( const RunContext & ctx ,
StreamType * _os ,
const std : : string & label ,
const NDArray & arr ) {
2017-09-13 12:34:48 -07:00
if ( ! label . empty ( ) ) {
* _os < < label < < " : " ;
}
switch ( arr . storage_type ( ) ) {
case kRowSparseStorage : {
// data
2019-02-28 17:41:39 -08:00
const mxnet : : TShape & shape = arr . shape ( ) ;
2017-09-13 12:34:48 -07:00
print_shape ( _os , " [row_sparse] main shape " , shape , false ) ;
2019-02-28 17:41:39 -08:00
const mxnet : : TShape & storage_shape = arr . storage_shape ( ) ;
2021-11-19 09:27:00 +01:00
const bool is_one_row = storage_shape [ 0 ] < 2 ;
2017-09-13 12:34:48 -07:00
print_shape ( _os , " storage shape " , storage_shape , false ) ;
2017-10-15 13:34:21 -07:00
print ( ctx , _os , arr . data ( ) , true , true , ! is_one_row ) ;
2017-09-13 12:34:48 -07:00
// indices
2019-02-28 17:41:39 -08:00
const mxnet : : TShape & indices_shape = arr . aux_shape ( rowsparse : : kIdx ) ;
2017-09-13 12:34:48 -07:00
print_shape ( _os , " indices shape " , indices_shape , false ) ;
2017-10-15 13:34:21 -07:00
print ( ctx , _os , arr . aux_data ( rowsparse : : kIdx ) , true , true , false ) < < std : : endl ;
2017-09-13 12:34:48 -07:00
break ;
}
case kCSRStorage : {
// data
2019-02-28 17:41:39 -08:00
const mxnet : : TShape & shape = arr . shape ( ) ;
2017-09-13 12:34:48 -07:00
print_shape ( _os , " [CSR] main shape " , shape , false ) ;
2019-02-28 17:41:39 -08:00
const mxnet : : TShape & storage_shape = arr . storage_shape ( ) ;
2021-11-19 09:27:00 +01:00
const bool is_one_row = storage_shape [ 0 ] < 2 ;
2017-09-13 12:34:48 -07:00
print_shape ( _os , " storage shape " , storage_shape , false ) ;
2017-10-15 13:34:21 -07:00
print ( ctx , _os , arr . data ( ) , true , true , ! is_one_row ) ;
2017-09-13 12:34:48 -07:00
// row ptrs
2019-02-28 17:41:39 -08:00
const mxnet : : TShape & ind_ptr_shape = arr . aux_shape ( csr : : kIndPtr ) ;
2017-09-13 12:34:48 -07:00
print_shape ( _os , " row ptrs shape " , ind_ptr_shape , false ) ;
2017-10-15 13:34:21 -07:00
print ( ctx , _os , arr . aux_data ( csr : : kIndPtr ) , true , true , false ) < < std : : endl ;
2017-09-13 12:34:48 -07:00
// col indices
2019-02-28 17:41:39 -08:00
const mxnet : : TShape & indices_shape = arr . aux_shape ( csr : : kIdx ) ;
2017-09-13 12:34:48 -07:00
print_shape ( _os , " col indices shape " , indices_shape , false ) ;
2017-10-15 13:34:21 -07:00
print ( ctx , _os , arr . aux_data ( csr : : kIdx ) , true , true , false ) < < std : : endl ;
2017-09-13 12:34:48 -07:00
break ;
}
case kDefaultStorage : {
// data
2019-02-28 17:41:39 -08:00
const mxnet : : TShape & shape = arr . shape ( ) ;
2021-11-19 09:27:00 +01:00
const bool is_one_row = shape [ 0 ] < 2 ;
2017-09-13 12:34:48 -07:00
print_shape ( _os , " [dense] main shape " , shape , ! is_one_row ) ;
2017-10-15 13:34:21 -07:00
print ( ctx , _os , arr . data ( ) , true , true , ! is_one_row ) < < std : : endl ;
2017-09-13 12:34:48 -07:00
break ;
}
default :
CHECK ( false ) < < " Unsupported storage type: " < < arr . storage_type ( ) ;
break ;
}
return * _os < < std : : flush ;
}
2017-10-15 13:34:21 -07:00
inline void print ( const RunContext & ctx ,
const std : : string & label ,
2017-09-13 12:34:48 -07:00
const std : : string & var ,
const std : : vector < NDArray > & arrays ) {
std : : cout < < label < < std : : endl ;
for ( size_t x = 0 , n = arrays . size ( ) ; x < n ; + + x ) {
std : : stringstream ss ;
ss < < var < < " [ " < < x < < " ] " ;
2017-10-15 13:34:21 -07:00
test : : print ( ctx , & std : : cout , ss . str ( ) , arrays [ x ] ) ;
2017-09-13 12:34:48 -07:00
}
}
2017-10-15 13:34:21 -07:00
inline void print ( const RunContext & ctx ,
const std : : string & label ,
2017-09-13 12:34:48 -07:00
const std : : string & var ,
const std : : vector < TBlob > & arrays ) {
std : : cout < < label < < std : : endl ;
for ( size_t x = 0 , n = arrays . size ( ) ; x < n ; + + x ) {
std : : stringstream ss ;
ss < < var < < " [ " < < x < < " ] " ;
2017-10-15 13:34:21 -07:00
test : : print ( ctx , & std : : cout , ss . str ( ) , arrays [ x ] , true , true , false ) ;
2017-09-13 12:34:48 -07:00
}
}
2021-11-19 09:27:00 +01:00
inline std : : string demangle ( const char * name ) {
2017-12-14 17:25:26 +01:00
# if defined(__GLIBCXX__) || defined(_LIBCPP_VERSION)
2017-09-13 12:34:48 -07:00
int status = - 4 ; // some arbitrary value to eliminate the compiler warning
2021-11-19 09:27:00 +01:00
std : : unique_ptr < char , void ( * ) ( void * ) > res { abi : : __cxa_demangle ( name , nullptr , nullptr , & status ) ,
& std : : free } ;
2017-09-13 12:34:48 -07:00
return status ? name : res . get ( ) ;
2017-12-14 17:25:26 +01:00
# else
return name ;
# endif
2017-09-13 12:34:48 -07:00
}
2021-11-19 09:27:00 +01:00
template < typename T >
inline std : : string type_name ( ) {
return demangle ( typeid ( T ) . name ( ) ) ;
}
2017-11-06 21:43:38 -08:00
2021-11-19 09:27:00 +01:00
# define PRINT_NDARRAYS(__ctx$, __var) test::print(__ctx$, __FUNCTION__, #__var, __var)
# define PRINT_OP_AND_ARRAYS(__ctx$, __op, __var) \
test::print(__ctx$, \
__FUNCTION__, \
static_cast<std::stringstream*>( \
&(std::stringstream() << #__var << "<" << type_name<__op>() << ">")) \
->str(), \
__var)
2017-10-15 13:34:21 -07:00
# define PRINT_OP2_AND_ARRAYS(__ctx$, __op1, __op2, __var) test::print(__ctx$, __FUNCTION__, \
2017-09-13 12:34:48 -07:00
static_cast<std::stringstream *>(&(std::stringstream() << #__var << \
2017-11-06 21:43:38 -08:00
"<" << type_name<__op1>().name()) << ", " \
<< type_name<__op2>() << ">"))->str(), __var)
2017-09-13 12:34:48 -07:00
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
/*! \brief Fill blob with some pattern defined by the getNextData() callback
* Pattern fill in the defined order (important for analysis):
* 1D: batch item -> channel -> depth -> row -> col
* 2D: batch item -> channel -> row -> col
* 3D: batch item -> channel -> col
*/
2021-11-19 09:27:00 +01:00
template < typename GetNextData >
2018-02-15 14:44:34 -08:00
static inline void patternFill ( const RunContext & run_ctx ,
2021-11-19 09:27:00 +01:00
const TBlob * _blob ,
2018-02-15 14:44:34 -08:00
GetNextData getNextData ) {
AccessAsCPU ( * _blob , run_ctx , [ getNextData ] ( const TBlob & blob ) {
const size_t dim = static_cast < size_t > ( blob . ndim ( ) ) ;
CHECK_LE ( dim , 5U ) < < " Will need to handle above 3 dimensions (another for loop) " ;
2021-11-19 09:27:00 +01:00
const size_t num = blob . size ( 0 ) ;
const size_t channels = dim > 1 ? blob . size ( 1 ) : 1 ;
const size_t depth = dim > 2 ? blob . size ( 2 ) : 1 ;
const size_t height = dim > 3 ? blob . size ( 3 ) : 1 ;
const size_t width = dim > 4 ? blob . size ( 4 ) : 1 ;
2018-02-15 14:44:34 -08:00
const size_t numberOfIndexes = blob . shape_ . Size ( ) ;
for ( size_t n = 0 ; n < num ; + + n ) {
if ( dim > 1 ) {
for ( size_t ch = 0 ; ch < channels ; + + ch ) {
if ( dim > 2 ) {
for ( size_t d = 0 ; d < depth ; + + d ) {
if ( dim > 3 ) {
for ( size_t row = 0 ; row < height ; + + row ) {
if ( dim > 4 ) {
for ( size_t col = 0 ; col < width ; + + col ) {
if ( dim = = 5 ) {
const size_t idx = test : : offset ( blob . shape_ , { n , ch , d , row , col } ) ;
CHECK_LT ( idx , numberOfIndexes ) ;
MSHADOW_TYPE_SWITCH ( blob . type_flag_ , ThisDataType , {
2021-11-19 09:27:00 +01:00
ThisDataType & f = blob . dptr < ThisDataType > ( ) [ idx ] ;
f = getNextData ( ) ;
2018-02-15 14:44:34 -08:00
} ) ;
} else {
CHECK ( dim < = 5 ) < < " Unimplemented dimension: " < < dim ;
}
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
2018-02-15 14:44:34 -08:00
} else {
const size_t idx = test : : offset ( blob . shape_ , { n , ch , d , row } ) ;
CHECK_LT ( idx , numberOfIndexes ) ;
MSHADOW_TYPE_SWITCH ( blob . type_flag_ , ThisDataType , {
2021-11-19 09:27:00 +01:00
ThisDataType & f = blob . dptr < ThisDataType > ( ) [ idx ] ;
f = getNextData ( ) ;
2018-02-15 14:44:34 -08:00
} ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
}
2018-02-15 14:44:34 -08:00
} else {
const size_t idx = test : : offset ( blob . shape_ , { n , ch , d } ) ;
CHECK_LT ( idx , numberOfIndexes ) ;
MSHADOW_TYPE_SWITCH ( blob . type_flag_ , ThisDataType , {
2021-11-19 09:27:00 +01:00
ThisDataType & f = blob . dptr < ThisDataType > ( ) [ idx ] ;
f = getNextData ( ) ;
2018-02-15 14:44:34 -08:00
} ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
}
2018-02-15 14:44:34 -08:00
} else {
const size_t idx = test : : offset ( blob . shape_ , { n , ch } ) ;
CHECK_LT ( idx , numberOfIndexes ) ;
MSHADOW_TYPE_SWITCH ( blob . type_flag_ , ThisDataType , {
2021-11-19 09:27:00 +01:00
ThisDataType & f = blob . dptr < ThisDataType > ( ) [ idx ] ;
f = getNextData ( ) ;
2018-02-15 14:44:34 -08:00
} ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
}
2018-02-15 14:44:34 -08:00
} else {
const size_t idx = test : : offset ( blob . shape_ , { n } ) ;
CHECK_LT ( idx , numberOfIndexes ) ;
MSHADOW_TYPE_SWITCH ( blob . type_flag_ , ThisDataType , {
2021-11-19 09:27:00 +01:00
ThisDataType & f = blob . dptr < ThisDataType > ( ) [ idx ] ;
f = getNextData ( ) ;
2018-02-15 14:44:34 -08:00
} ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
}
2018-02-15 14:44:34 -08:00
} ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
}
/*! \brief Return a random number within a given range (inclusive) */
2021-11-19 09:27:00 +01:00
template < class ScalarType >
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
inline ScalarType rangedRand ( const ScalarType min , const ScalarType max ) {
2021-11-19 09:27:00 +01:00
uint64_t num_bins = static_cast < uint64_t > ( max + 1 ) , num_rand = static_cast < uint64_t > ( RAND_MAX ) ,
bin_size = num_rand / num_bins , defect = num_rand % num_bins ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
ScalarType x ;
do {
2017-12-14 17:25:26 +01:00
x = std : : rand ( ) ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
} while ( num_rand - defect < = ( uint64_t ) x ) ;
return static_cast < ScalarType > ( x / bin_size + min ) ;
}
2017-11-11 12:05:57 -08:00
/*!
2019-02-28 17:41:39 -08:00
* \brief Deterministically compare mxnet::TShape objects as less-than,
2017-11-11 12:05:57 -08:00
* for use in stl sorted key such as map and set
* \param s1 First shape
* \param s2 Second shape
* \return true if s1 is less than s2
*/
2021-11-19 09:27:00 +01:00
inline bool operator < ( const mxnet : : TShape & s1 , const mxnet : : TShape & s2 ) {
2017-11-11 12:05:57 -08:00
if ( s1 . Size ( ) = = s2 . Size ( ) ) {
if ( s1 . ndim ( ) = = s2 . ndim ( ) ) {
for ( size_t i = 0 , n = s1 . ndim ( ) ; i < n ; + + i ) {
if ( s1 [ i ] = = s2 [ i ] ) {
continue ;
}
return s1 [ i ] < s2 [ i ] ;
}
return false ;
}
return s1 . ndim ( ) < s2 . ndim ( ) ;
}
return s1 . Size ( ) < s2 . Size ( ) ;
}
/*!
2019-02-28 17:41:39 -08:00
* \brief Deterministically compare a vector of mxnet::TShape objects as less-than,
2017-11-11 12:05:57 -08:00
* for use in stl sorted key such as map and set
* \param v1 First vector of shapes
* \param v2 Second vector of shapes
* \return true if v1 is less than v2
*/
2021-11-19 09:27:00 +01:00
inline bool operator < ( const std : : vector < mxnet : : TShape > & v1 , const std : : vector < mxnet : : TShape > & v2 ) {
2017-11-11 12:05:57 -08:00
if ( v1 . size ( ) = = v2 . size ( ) ) {
for ( size_t i = 0 , n = v1 . size ( ) ; i < n ; + + i ) {
if ( v1 [ i ] = = v2 [ i ] ) {
continue ;
}
return v1 [ i ] < v2 [ i ] ;
}
return false ;
}
return v1 . size ( ) < v2 . size ( ) ;
}
/*!
* \brief std::less compare structure for compating vectors of shapes for stl sorted containers
*/
struct less_shapevect {
2019-02-28 17:41:39 -08:00
bool operator ( ) ( const std : : vector < mxnet : : TShape > & v1 ,
const std : : vector < mxnet : : TShape > & v2 ) const {
2017-11-11 12:05:57 -08:00
if ( v1 . size ( ) = = v2 . size ( ) ) {
for ( size_t i = 0 , n = v1 . size ( ) ; i < n ; + + i ) {
if ( v1 [ i ] = = v2 [ i ] ) {
continue ;
}
return v1 [ i ] < v2 [ i ] ;
}
return false ;
}
return v1 . size ( ) < v2 . size ( ) ;
}
} ;
2017-11-06 21:43:38 -08:00
inline std : : string pretty_num ( uint64_t val ) {
2017-11-21 06:49:51 -08:00
if ( ! test : : csv ) {
std : : string res , s = std : : to_string ( val ) ;
size_t ctr = 0 ;
for ( int i = static_cast < int > ( s . size ( ) ) - 1 ; i > = 0 ; - - i , + + ctr ) {
if ( ctr & & ( ctr % 3 ) = = 0 ) {
res + = " , " ;
}
res . push_back ( s [ i ] ) ;
2017-11-06 21:43:38 -08:00
}
2017-11-21 06:49:51 -08:00
std : : reverse ( res . begin ( ) , res . end ( ) ) ;
return res ;
} else {
return std : : to_string ( val ) ;
2017-11-06 21:43:38 -08:00
}
}
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
/*! \brief Change a value during the scope of this declaration */
2021-11-19 09:27:00 +01:00
template < typename T >
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
struct ScopeSet {
2021-11-19 09:27:00 +01:00
inline ScopeSet ( T * var , const T tempValue ) : var_ ( * var ) , saveValue_ ( var ) {
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
* var = tempValue ;
}
inline ~ ScopeSet ( ) {
var_ = saveValue_ ;
}
T & var_ ;
2021-11-19 09:27:00 +01:00
T saveValue_ ;
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
} ;
2022-07-19 17:00:19 +02:00
static inline void AssertEqual ( const std : : vector < NDArray * > & in_arrs ,
const std : : vector < NDArray * > & out_arrs ,
float rtol = 1e-5 ,
float atol = 1e-8 ,
bool test_first_only = false ) {
Multithreaded Inference Support (#16654)
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Add CI changes
* Add stage
Fix indentation
* Fix lint
* Change to DEFAULT for C API
* Fix mxnet_unit_tests path
* export correct LD_LIBRARY_PATH
* Add cpp include dirs
* Build test with USE_CPP_PACKAGE
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Merge
* change mkldnn lib name
* Add static_alloc, static_Shape support
* Address review comments
* Make GetCachedOpThreadSafeState similar to cached_op
* Address review comments: comments for locking strategy
* multithreaded inference tutorial
* [Estimator] handle composite metrics in estimator (#16676)
* handle composite metrics in estimator
* fix composite metric case in handlers
* remove unused import
* [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678)
* refactor estimator to allow overriding evaluate/fit of a batch
* add doc to explain call structure and how to override
* fix and doc
* Pointwise fusion for GPU (#15167)
* Beginning of RTC of pointwise ops
* Code generation from the given JSON
* add initial simple_partition_pass and use it for pointwise fusion
* fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code
* Fixes
* Adding support for attribute inference for backward nodes when fusing
* keep proper input ordering for fused Op
* instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph
* Fuse backward
* fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch
* excluse forward node fusion during the fusion of the nodes in the backward graph
* Dealing with fused backward nodes inferattr
* use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph
* Adding support for other reqs in codegen
* Fix
* Cleaning
* Change the TVM submodule
* More cleaning
* Making linter happy
* Do fusion only if default context is GPU
* Fixes for tests
Add powerscalar and rpowerscalar, fix return type of zero and one
Cleaning, fixing lint
Go back to proper TVM submodule
* Fix the TVM commit
* Fix lint
* Guard fusion with MXNET_USE_CUDA
* Fix
* Fix clang-tidy
* Add erf and erfinv backward
* Gluon support for fusion
* Cleaning
* Cleaning and allow shape/type change in FusedOp
* Fixing Gluon bugs
* Fixing after rebase
* Fixing race condition and guarding against races when using NVRTC
* Cleaning and renaming FusedOp to _FusedOp
* Going easy on Windows compiler
* Disable fusion on Windows for now
* Refactor InferAttr and InferShapeAttr
* Added slice and half2 support to FusedOp
* Fix lint errors
* Added multiple types support for vector loading/storing
* add slice fusion when it's at the beginning of subgraphs
* Removed constant ndim assumption in fused op
* Fix memory alignment issue in slice for FusedOp
* Fixes
* Fix lint errors
* Do not include cuda_fp16.h
* Refactor fused op op lists
* Make linter happy
* Changes from review
* Fixes after rebase
* Expand FusedOp support for slice
* Fix for fp16 _zeros and _ones
* Fix
* Moving aux functions to unnamed namespace and detail namespace -> fusion
namespace
* Disabling fusion if it alters topological order of inputs
* Print code only when env variable is set
* Fix
* Fix lint and 2 tests that specify the same names for multiple inputs
* Fixes from review and disabling fusion of slice with non-default step
* Add amp_cast to fusion, fixes
* Add amp_multicast and its backward to the list of support ops
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Make clearer comment
* Adding punctuation and capitalization to \brief descriptions
* Fix
* Fix
* Add backward_cast to fusion
* Adding unittests for fusion. Fix for erfinv_grad
* Adding slice ops and add_n to tests
* Fixes from review
* Setting inplace option
* Fix lint
* Storing double in half
* Retrigger CI
* Slight relaxing of the relative tolerance in the test
* Move the env variable check to the end
* Fix a race condition between InferShape and scheduled Forward
* Fix flakey test_fusion test involving fp32 erfinv op.
* Fix from review
* Added broadcast_like and slice_like to fused op
* Minor fix and cleanup
* Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like
* Added axes support to slice_like
* Added axis support to broadcast_like
* Add fast_load_slice function to fused op code
* Added runtime switch for choosing fast and slow slice kernel
* Fix lint and warning
* Going easy on Windows compiler (again)
* Fix slice_like
* Debug broadcast_like fusion
* Fix lint
* Fix lint
* Trigger CI
* Get rid of the initializer list
* Fix backward calls with different gradient type
* avoid cycle when adding node specific for inputs of subgraph for pointwise fusion
* Fix lint
* Add namespace to the fusion implementations
* Set launch bounds on the fused kernel
* Fix NumPy tests
* Test showcasing an issue fixed in PR #16553
* Cast scalarts to FP32 and perform (a*1.0/b) instead of (a/b)
Fix lint errors
Fix lint
* Fix a bug in cycle detection for inputs only op in pointwise fusion
* Add comments to simple_partition_pass.h file
* fix install dir (#16690)
* [numpy] add numpy operator : append (#16564)
* add operator : append ; fix op concatenate when axis = None
* pylint disable
remove mistake
disable pylint
* Initializer.__eq__ (#16680)
* fix binary dependencies in CD and nightly (#16693)
* [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688)
* add mxnet mkldnn cmake instruction
* imporve doc
* OMP->OpenMP
* Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697)
This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa.
* [Estimator] refactor estimator and clarify docs (#16694)
* refactor estimator and clarify docs
* fix info message and test
* clean up after releasing logging handler
* Eliminate common expressions (#15657)
* Eliminate common expressions from a graph
* Guarding against optimizing out stateful ops and ops that require
resource
* Fix lint
* Added THasDeterministicOutput to multiple ops
* DDebug eliminate common expr
* Added test
* Expose get_optimized_symbol
* Fix
* Fix 2
* Add doc to the Python call
* Add env var MXNET_ELIMINATE_COMMON_EXPR, default true
* Add comments, improve readability of eliminate_common_expr_pass.cc
* Expand testing
* Lower priority of THasDeterministicOutput attr for equal Node test
* Change mx.gpu() to mx.cpu() in tests
* Skip CSE test on Windows (as env variable setting during test does not work there)
* Add missing import sys
* Add missing import logging
* Backport of #16711, #16737, #16408 to 1.6 branch (#16763)
* support mixed-precision true_divide (#16711)
* [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737)
* use dim_t instead of int
* fix same issue in pooling
* rebase code
* trigger CI
* Add MXNet Ops for fast multihead attention (#16408)
* add MXNet Ops for fast multihead attention
* add cutlass as 3rdparty dependency
* add cutlass to compilation flags
* remove all cutlass stuff
* add better error message and description and remove cutlass from compilation flags
* change credit for the approach since the code have changed
* fix typos
* correct another typo
* Add all the cuda/cublas helper functions
* remove tests using kAddTo
* only use cublasStridedBatchedGemm if CUDA >= 9.1
* add equivalent mxnet code in description of mha ops
* remove a wrong copy-paste
* add _contrib for namespace and add GPU only on description
* add warning in bwd_ignore_zero_init description, also test with fp32
* add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO
* remove std::move for clang
* remove bwd_ignore_zero_init flag
* remove bwd_ignore_zero_init in test_operator_gpu.py
* fix typo
* fix another typo
* Removed unrelated test
* Add example and documentation for multi threaded inference
* Add LICENSE
* Add get_model.py
* Add license for README
* Refactor cached op and cached op threadsafe
* Add limitation
* Add tests for naive engine
* Add latest test changes
* Thread Safety tests in NaiveEngine mode
* Thread Safety tests update
* Update thread safety tests, add unsupported use cases
* Changes to doc and refactor
* Fix todo owner, indentation and mx_float->float
* Refactor cached op code, remove num_threads arg from example
* Fix lint
* Fix warning
* Add back cython, required for unix-gpu build
* Fix for windows
* Add bulking support for thread safe cached op version
* Add support for subgraph testing
* import mxnet before calling get_backend_symbol
* Fix symbol json name
* Refactor DynamicForward
* Add comments
* Add DMLC_ATTRIBUTE_UNUSED
* Fix use_naive_run issue
* Fix lint
* Revert unittest_cpp to old test since it doesnt test thread safety
* Fix doc
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: Tao Lv <tao.a.lv@intel.com>
Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Zhennan Qin <zhennan.qin@intel.com>
2020-02-01 09:36:59 -08:00
for ( size_t j = 0 ; j < in_arrs . size ( ) ; + + j ) {
// When test_all is fir
if ( test_first_only & & j = = 1 ) {
return ;
}
NDArray tmp1 = * in_arrs [ j ] ;
NDArray tmp2 = * out_arrs [ j ] ;
if ( tmp1 . ctx ( ) . dev_type = = mxnet : : Context : : kGPU ) {
tmp1 = tmp1 . Copy ( mxnet : : Context : : CPU ( 0 ) ) ;
tmp2 = tmp2 . Copy ( mxnet : : Context : : CPU ( 0 ) ) ;
tmp1 . WaitToRead ( ) ;
tmp2 . WaitToRead ( ) ;
}
2021-03-15 17:32:37 +01:00
# if MXNET_USE_ONEDNN == 1
Multithreaded Inference Support (#16654)
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Add CI changes
* Add stage
Fix indentation
* Fix lint
* Change to DEFAULT for C API
* Fix mxnet_unit_tests path
* export correct LD_LIBRARY_PATH
* Add cpp include dirs
* Build test with USE_CPP_PACKAGE
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Merge
* change mkldnn lib name
* Add static_alloc, static_Shape support
* Address review comments
* Make GetCachedOpThreadSafeState similar to cached_op
* Address review comments: comments for locking strategy
* multithreaded inference tutorial
* [Estimator] handle composite metrics in estimator (#16676)
* handle composite metrics in estimator
* fix composite metric case in handlers
* remove unused import
* [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678)
* refactor estimator to allow overriding evaluate/fit of a batch
* add doc to explain call structure and how to override
* fix and doc
* Pointwise fusion for GPU (#15167)
* Beginning of RTC of pointwise ops
* Code generation from the given JSON
* add initial simple_partition_pass and use it for pointwise fusion
* fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code
* Fixes
* Adding support for attribute inference for backward nodes when fusing
* keep proper input ordering for fused Op
* instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph
* Fuse backward
* fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch
* excluse forward node fusion during the fusion of the nodes in the backward graph
* Dealing with fused backward nodes inferattr
* use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph
* Adding support for other reqs in codegen
* Fix
* Cleaning
* Change the TVM submodule
* More cleaning
* Making linter happy
* Do fusion only if default context is GPU
* Fixes for tests
Add powerscalar and rpowerscalar, fix return type of zero and one
Cleaning, fixing lint
Go back to proper TVM submodule
* Fix the TVM commit
* Fix lint
* Guard fusion with MXNET_USE_CUDA
* Fix
* Fix clang-tidy
* Add erf and erfinv backward
* Gluon support for fusion
* Cleaning
* Cleaning and allow shape/type change in FusedOp
* Fixing Gluon bugs
* Fixing after rebase
* Fixing race condition and guarding against races when using NVRTC
* Cleaning and renaming FusedOp to _FusedOp
* Going easy on Windows compiler
* Disable fusion on Windows for now
* Refactor InferAttr and InferShapeAttr
* Added slice and half2 support to FusedOp
* Fix lint errors
* Added multiple types support for vector loading/storing
* add slice fusion when it's at the beginning of subgraphs
* Removed constant ndim assumption in fused op
* Fix memory alignment issue in slice for FusedOp
* Fixes
* Fix lint errors
* Do not include cuda_fp16.h
* Refactor fused op op lists
* Make linter happy
* Changes from review
* Fixes after rebase
* Expand FusedOp support for slice
* Fix for fp16 _zeros and _ones
* Fix
* Moving aux functions to unnamed namespace and detail namespace -> fusion
namespace
* Disabling fusion if it alters topological order of inputs
* Print code only when env variable is set
* Fix
* Fix lint and 2 tests that specify the same names for multiple inputs
* Fixes from review and disabling fusion of slice with non-default step
* Add amp_cast to fusion, fixes
* Add amp_multicast and its backward to the list of support ops
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Make clearer comment
* Adding punctuation and capitalization to \brief descriptions
* Fix
* Fix
* Add backward_cast to fusion
* Adding unittests for fusion. Fix for erfinv_grad
* Adding slice ops and add_n to tests
* Fixes from review
* Setting inplace option
* Fix lint
* Storing double in half
* Retrigger CI
* Slight relaxing of the relative tolerance in the test
* Move the env variable check to the end
* Fix a race condition between InferShape and scheduled Forward
* Fix flakey test_fusion test involving fp32 erfinv op.
* Fix from review
* Added broadcast_like and slice_like to fused op
* Minor fix and cleanup
* Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like
* Added axes support to slice_like
* Added axis support to broadcast_like
* Add fast_load_slice function to fused op code
* Added runtime switch for choosing fast and slow slice kernel
* Fix lint and warning
* Going easy on Windows compiler (again)
* Fix slice_like
* Debug broadcast_like fusion
* Fix lint
* Fix lint
* Trigger CI
* Get rid of the initializer list
* Fix backward calls with different gradient type
* avoid cycle when adding node specific for inputs of subgraph for pointwise fusion
* Fix lint
* Add namespace to the fusion implementations
* Set launch bounds on the fused kernel
* Fix NumPy tests
* Test showcasing an issue fixed in PR #16553
* Cast scalarts to FP32 and perform (a*1.0/b) instead of (a/b)
Fix lint errors
Fix lint
* Fix a bug in cycle detection for inputs only op in pointwise fusion
* Add comments to simple_partition_pass.h file
* fix install dir (#16690)
* [numpy] add numpy operator : append (#16564)
* add operator : append ; fix op concatenate when axis = None
* pylint disable
remove mistake
disable pylint
* Initializer.__eq__ (#16680)
* fix binary dependencies in CD and nightly (#16693)
* [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688)
* add mxnet mkldnn cmake instruction
* imporve doc
* OMP->OpenMP
* Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697)
This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa.
* [Estimator] refactor estimator and clarify docs (#16694)
* refactor estimator and clarify docs
* fix info message and test
* clean up after releasing logging handler
* Eliminate common expressions (#15657)
* Eliminate common expressions from a graph
* Guarding against optimizing out stateful ops and ops that require
resource
* Fix lint
* Added THasDeterministicOutput to multiple ops
* DDebug eliminate common expr
* Added test
* Expose get_optimized_symbol
* Fix
* Fix 2
* Add doc to the Python call
* Add env var MXNET_ELIMINATE_COMMON_EXPR, default true
* Add comments, improve readability of eliminate_common_expr_pass.cc
* Expand testing
* Lower priority of THasDeterministicOutput attr for equal Node test
* Change mx.gpu() to mx.cpu() in tests
* Skip CSE test on Windows (as env variable setting during test does not work there)
* Add missing import sys
* Add missing import logging
* Backport of #16711, #16737, #16408 to 1.6 branch (#16763)
* support mixed-precision true_divide (#16711)
* [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737)
* use dim_t instead of int
* fix same issue in pooling
* rebase code
* trigger CI
* Add MXNet Ops for fast multihead attention (#16408)
* add MXNet Ops for fast multihead attention
* add cutlass as 3rdparty dependency
* add cutlass to compilation flags
* remove all cutlass stuff
* add better error message and description and remove cutlass from compilation flags
* change credit for the approach since the code have changed
* fix typos
* correct another typo
* Add all the cuda/cublas helper functions
* remove tests using kAddTo
* only use cublasStridedBatchedGemm if CUDA >= 9.1
* add equivalent mxnet code in description of mha ops
* remove a wrong copy-paste
* add _contrib for namespace and add GPU only on description
* add warning in bwd_ignore_zero_init description, also test with fp32
* add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO
* remove std::move for clang
* remove bwd_ignore_zero_init flag
* remove bwd_ignore_zero_init in test_operator_gpu.py
* fix typo
* fix another typo
* Removed unrelated test
* Add example and documentation for multi threaded inference
* Add LICENSE
* Add get_model.py
* Add license for README
* Refactor cached op and cached op threadsafe
* Add limitation
* Add tests for naive engine
* Add latest test changes
* Thread Safety tests in NaiveEngine mode
* Thread Safety tests update
* Update thread safety tests, add unsupported use cases
* Changes to doc and refactor
* Fix todo owner, indentation and mx_float->float
* Refactor cached op code, remove num_threads arg from example
* Fix lint
* Fix warning
* Add back cython, required for unix-gpu build
* Fix for windows
* Add bulking support for thread safe cached op version
* Add support for subgraph testing
* import mxnet before calling get_backend_symbol
* Fix symbol json name
* Refactor DynamicForward
* Add comments
* Add DMLC_ATTRIBUTE_UNUSED
* Fix use_naive_run issue
* Fix lint
* Revert unittest_cpp to old test since it doesnt test thread safety
* Fix doc
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: Tao Lv <tao.a.lv@intel.com>
Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Zhennan Qin <zhennan.qin@intel.com>
2020-02-01 09:36:59 -08:00
tmp1 = tmp1 . Reorder2Default ( ) ;
tmp2 = tmp2 . Reorder2Default ( ) ;
# endif
EXPECT_EQ ( tmp1 . shape ( ) . Size ( ) , tmp2 . shape ( ) . Size ( ) ) ;
2021-11-19 09:27:00 +01:00
TBlob blob1 = tmp1 . data ( ) ;
TBlob blob2 = tmp2 . data ( ) ;
mshadow : : default_real_t * d1 = static_cast < mshadow : : default_real_t * > ( blob1 . dptr_ ) ;
mshadow : : default_real_t * d2 = static_cast < mshadow : : default_real_t * > ( blob2 . dptr_ ) ;
Multithreaded Inference Support (#16654)
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Add CI changes
* Add stage
Fix indentation
* Fix lint
* Change to DEFAULT for C API
* Fix mxnet_unit_tests path
* export correct LD_LIBRARY_PATH
* Add cpp include dirs
* Build test with USE_CPP_PACKAGE
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Merge
* change mkldnn lib name
* Add static_alloc, static_Shape support
* Address review comments
* Make GetCachedOpThreadSafeState similar to cached_op
* Address review comments: comments for locking strategy
* multithreaded inference tutorial
* [Estimator] handle composite metrics in estimator (#16676)
* handle composite metrics in estimator
* fix composite metric case in handlers
* remove unused import
* [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678)
* refactor estimator to allow overriding evaluate/fit of a batch
* add doc to explain call structure and how to override
* fix and doc
* Pointwise fusion for GPU (#15167)
* Beginning of RTC of pointwise ops
* Code generation from the given JSON
* add initial simple_partition_pass and use it for pointwise fusion
* fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code
* Fixes
* Adding support for attribute inference for backward nodes when fusing
* keep proper input ordering for fused Op
* instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph
* Fuse backward
* fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch
* excluse forward node fusion during the fusion of the nodes in the backward graph
* Dealing with fused backward nodes inferattr
* use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph
* Adding support for other reqs in codegen
* Fix
* Cleaning
* Change the TVM submodule
* More cleaning
* Making linter happy
* Do fusion only if default context is GPU
* Fixes for tests
Add powerscalar and rpowerscalar, fix return type of zero and one
Cleaning, fixing lint
Go back to proper TVM submodule
* Fix the TVM commit
* Fix lint
* Guard fusion with MXNET_USE_CUDA
* Fix
* Fix clang-tidy
* Add erf and erfinv backward
* Gluon support for fusion
* Cleaning
* Cleaning and allow shape/type change in FusedOp
* Fixing Gluon bugs
* Fixing after rebase
* Fixing race condition and guarding against races when using NVRTC
* Cleaning and renaming FusedOp to _FusedOp
* Going easy on Windows compiler
* Disable fusion on Windows for now
* Refactor InferAttr and InferShapeAttr
* Added slice and half2 support to FusedOp
* Fix lint errors
* Added multiple types support for vector loading/storing
* add slice fusion when it's at the beginning of subgraphs
* Removed constant ndim assumption in fused op
* Fix memory alignment issue in slice for FusedOp
* Fixes
* Fix lint errors
* Do not include cuda_fp16.h
* Refactor fused op op lists
* Make linter happy
* Changes from review
* Fixes after rebase
* Expand FusedOp support for slice
* Fix for fp16 _zeros and _ones
* Fix
* Moving aux functions to unnamed namespace and detail namespace -> fusion
namespace
* Disabling fusion if it alters topological order of inputs
* Print code only when env variable is set
* Fix
* Fix lint and 2 tests that specify the same names for multiple inputs
* Fixes from review and disabling fusion of slice with non-default step
* Add amp_cast to fusion, fixes
* Add amp_multicast and its backward to the list of support ops
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Make clearer comment
* Adding punctuation and capitalization to \brief descriptions
* Fix
* Fix
* Add backward_cast to fusion
* Adding unittests for fusion. Fix for erfinv_grad
* Adding slice ops and add_n to tests
* Fixes from review
* Setting inplace option
* Fix lint
* Storing double in half
* Retrigger CI
* Slight relaxing of the relative tolerance in the test
* Move the env variable check to the end
* Fix a race condition between InferShape and scheduled Forward
* Fix flakey test_fusion test involving fp32 erfinv op.
* Fix from review
* Added broadcast_like and slice_like to fused op
* Minor fix and cleanup
* Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like
* Added axes support to slice_like
* Added axis support to broadcast_like
* Add fast_load_slice function to fused op code
* Added runtime switch for choosing fast and slow slice kernel
* Fix lint and warning
* Going easy on Windows compiler (again)
* Fix slice_like
* Debug broadcast_like fusion
* Fix lint
* Fix lint
* Trigger CI
* Get rid of the initializer list
* Fix backward calls with different gradient type
* avoid cycle when adding node specific for inputs of subgraph for pointwise fusion
* Fix lint
* Add namespace to the fusion implementations
* Set launch bounds on the fused kernel
* Fix NumPy tests
* Test showcasing an issue fixed in PR #16553
* Cast scalarts to FP32 and perform (a*1.0/b) instead of (a/b)
Fix lint errors
Fix lint
* Fix a bug in cycle detection for inputs only op in pointwise fusion
* Add comments to simple_partition_pass.h file
* fix install dir (#16690)
* [numpy] add numpy operator : append (#16564)
* add operator : append ; fix op concatenate when axis = None
* pylint disable
remove mistake
disable pylint
* Initializer.__eq__ (#16680)
* fix binary dependencies in CD and nightly (#16693)
* [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688)
* add mxnet mkldnn cmake instruction
* imporve doc
* OMP->OpenMP
* Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697)
This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa.
* [Estimator] refactor estimator and clarify docs (#16694)
* refactor estimator and clarify docs
* fix info message and test
* clean up after releasing logging handler
* Eliminate common expressions (#15657)
* Eliminate common expressions from a graph
* Guarding against optimizing out stateful ops and ops that require
resource
* Fix lint
* Added THasDeterministicOutput to multiple ops
* DDebug eliminate common expr
* Added test
* Expose get_optimized_symbol
* Fix
* Fix 2
* Add doc to the Python call
* Add env var MXNET_ELIMINATE_COMMON_EXPR, default true
* Add comments, improve readability of eliminate_common_expr_pass.cc
* Expand testing
* Lower priority of THasDeterministicOutput attr for equal Node test
* Change mx.gpu() to mx.cpu() in tests
* Skip CSE test on Windows (as env variable setting during test does not work there)
* Add missing import sys
* Add missing import logging
* Backport of #16711, #16737, #16408 to 1.6 branch (#16763)
* support mixed-precision true_divide (#16711)
* [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737)
* use dim_t instead of int
* fix same issue in pooling
* rebase code
* trigger CI
* Add MXNet Ops for fast multihead attention (#16408)
* add MXNet Ops for fast multihead attention
* add cutlass as 3rdparty dependency
* add cutlass to compilation flags
* remove all cutlass stuff
* add better error message and description and remove cutlass from compilation flags
* change credit for the approach since the code have changed
* fix typos
* correct another typo
* Add all the cuda/cublas helper functions
* remove tests using kAddTo
* only use cublasStridedBatchedGemm if CUDA >= 9.1
* add equivalent mxnet code in description of mha ops
* remove a wrong copy-paste
* add _contrib for namespace and add GPU only on description
* add warning in bwd_ignore_zero_init description, also test with fp32
* add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO
* remove std::move for clang
* remove bwd_ignore_zero_init flag
* remove bwd_ignore_zero_init in test_operator_gpu.py
* fix typo
* fix another typo
* Removed unrelated test
* Add example and documentation for multi threaded inference
* Add LICENSE
* Add get_model.py
* Add license for README
* Refactor cached op and cached op threadsafe
* Add limitation
* Add tests for naive engine
* Add latest test changes
* Thread Safety tests in NaiveEngine mode
* Thread Safety tests update
* Update thread safety tests, add unsupported use cases
* Changes to doc and refactor
* Fix todo owner, indentation and mx_float->float
* Refactor cached op code, remove num_threads arg from example
* Fix lint
* Fix warning
* Add back cython, required for unix-gpu build
* Fix for windows
* Add bulking support for thread safe cached op version
* Add support for subgraph testing
* import mxnet before calling get_backend_symbol
* Fix symbol json name
* Refactor DynamicForward
* Add comments
* Add DMLC_ATTRIBUTE_UNUSED
* Fix use_naive_run issue
* Fix lint
* Revert unittest_cpp to old test since it doesnt test thread safety
* Fix doc
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: Tao Lv <tao.a.lv@intel.com>
Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Zhennan Qin <zhennan.qin@intel.com>
2020-02-01 09:36:59 -08:00
for ( int i = 0 ; i < tmp1 . shape ( ) . Size ( ) ; i + + ) {
float abs_err = fabs ( ( d1 [ i ] ) - ( d2 [ i ] ) ) ;
2020-02-19 10:49:32 +08:00
ASSERT_LE ( abs_err , ( atol + rtol * fabs ( d2 [ i ] ) ) )
< < " index: " < < i < < " , " < < d1 [ i ] < < " vs " < < d2 [ i ] ;
Multithreaded Inference Support (#16654)
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Add CI changes
* Add stage
Fix indentation
* Fix lint
* Change to DEFAULT for C API
* Fix mxnet_unit_tests path
* export correct LD_LIBRARY_PATH
* Add cpp include dirs
* Build test with USE_CPP_PACKAGE
* Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests
* Fix download cmd in runtime_functions
* Merge
* change mkldnn lib name
* Add static_alloc, static_Shape support
* Address review comments
* Make GetCachedOpThreadSafeState similar to cached_op
* Address review comments: comments for locking strategy
* multithreaded inference tutorial
* [Estimator] handle composite metrics in estimator (#16676)
* handle composite metrics in estimator
* fix composite metric case in handlers
* remove unused import
* [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678)
* refactor estimator to allow overriding evaluate/fit of a batch
* add doc to explain call structure and how to override
* fix and doc
* Pointwise fusion for GPU (#15167)
* Beginning of RTC of pointwise ops
* Code generation from the given JSON
* add initial simple_partition_pass and use it for pointwise fusion
* fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code
* Fixes
* Adding support for attribute inference for backward nodes when fusing
* keep proper input ordering for fused Op
* instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph
* Fuse backward
* fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch
* excluse forward node fusion during the fusion of the nodes in the backward graph
* Dealing with fused backward nodes inferattr
* use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph
* Adding support for other reqs in codegen
* Fix
* Cleaning
* Change the TVM submodule
* More cleaning
* Making linter happy
* Do fusion only if default context is GPU
* Fixes for tests
Add powerscalar and rpowerscalar, fix return type of zero and one
Cleaning, fixing lint
Go back to proper TVM submodule
* Fix the TVM commit
* Fix lint
* Guard fusion with MXNET_USE_CUDA
* Fix
* Fix clang-tidy
* Add erf and erfinv backward
* Gluon support for fusion
* Cleaning
* Cleaning and allow shape/type change in FusedOp
* Fixing Gluon bugs
* Fixing after rebase
* Fixing race condition and guarding against races when using NVRTC
* Cleaning and renaming FusedOp to _FusedOp
* Going easy on Windows compiler
* Disable fusion on Windows for now
* Refactor InferAttr and InferShapeAttr
* Added slice and half2 support to FusedOp
* Fix lint errors
* Added multiple types support for vector loading/storing
* add slice fusion when it's at the beginning of subgraphs
* Removed constant ndim assumption in fused op
* Fix memory alignment issue in slice for FusedOp
* Fixes
* Fix lint errors
* Do not include cuda_fp16.h
* Refactor fused op op lists
* Make linter happy
* Changes from review
* Fixes after rebase
* Expand FusedOp support for slice
* Fix for fp16 _zeros and _ones
* Fix
* Moving aux functions to unnamed namespace and detail namespace -> fusion
namespace
* Disabling fusion if it alters topological order of inputs
* Print code only when env variable is set
* Fix
* Fix lint and 2 tests that specify the same names for multiple inputs
* Fixes from review and disabling fusion of slice with non-default step
* Add amp_cast to fusion, fixes
* Add amp_multicast and its backward to the list of support ops
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Apply wording suggestions from code review
Co-Authored-By: Aaron Markham <markhama@amazon.com>
* Make clearer comment
* Adding punctuation and capitalization to \brief descriptions
* Fix
* Fix
* Add backward_cast to fusion
* Adding unittests for fusion. Fix for erfinv_grad
* Adding slice ops and add_n to tests
* Fixes from review
* Setting inplace option
* Fix lint
* Storing double in half
* Retrigger CI
* Slight relaxing of the relative tolerance in the test
* Move the env variable check to the end
* Fix a race condition between InferShape and scheduled Forward
* Fix flakey test_fusion test involving fp32 erfinv op.
* Fix from review
* Added broadcast_like and slice_like to fused op
* Minor fix and cleanup
* Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like
* Added axes support to slice_like
* Added axis support to broadcast_like
* Add fast_load_slice function to fused op code
* Added runtime switch for choosing fast and slow slice kernel
* Fix lint and warning
* Going easy on Windows compiler (again)
* Fix slice_like
* Debug broadcast_like fusion
* Fix lint
* Fix lint
* Trigger CI
* Get rid of the initializer list
* Fix backward calls with different gradient type
* avoid cycle when adding node specific for inputs of subgraph for pointwise fusion
* Fix lint
* Add namespace to the fusion implementations
* Set launch bounds on the fused kernel
* Fix NumPy tests
* Test showcasing an issue fixed in PR #16553
* Cast scalarts to FP32 and perform (a*1.0/b) instead of (a/b)
Fix lint errors
Fix lint
* Fix a bug in cycle detection for inputs only op in pointwise fusion
* Add comments to simple_partition_pass.h file
* fix install dir (#16690)
* [numpy] add numpy operator : append (#16564)
* add operator : append ; fix op concatenate when axis = None
* pylint disable
remove mistake
disable pylint
* Initializer.__eq__ (#16680)
* fix binary dependencies in CD and nightly (#16693)
* [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688)
* add mxnet mkldnn cmake instruction
* imporve doc
* OMP->OpenMP
* Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697)
This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa.
* [Estimator] refactor estimator and clarify docs (#16694)
* refactor estimator and clarify docs
* fix info message and test
* clean up after releasing logging handler
* Eliminate common expressions (#15657)
* Eliminate common expressions from a graph
* Guarding against optimizing out stateful ops and ops that require
resource
* Fix lint
* Added THasDeterministicOutput to multiple ops
* DDebug eliminate common expr
* Added test
* Expose get_optimized_symbol
* Fix
* Fix 2
* Add doc to the Python call
* Add env var MXNET_ELIMINATE_COMMON_EXPR, default true
* Add comments, improve readability of eliminate_common_expr_pass.cc
* Expand testing
* Lower priority of THasDeterministicOutput attr for equal Node test
* Change mx.gpu() to mx.cpu() in tests
* Skip CSE test on Windows (as env variable setting during test does not work there)
* Add missing import sys
* Add missing import logging
* Backport of #16711, #16737, #16408 to 1.6 branch (#16763)
* support mixed-precision true_divide (#16711)
* [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737)
* use dim_t instead of int
* fix same issue in pooling
* rebase code
* trigger CI
* Add MXNet Ops for fast multihead attention (#16408)
* add MXNet Ops for fast multihead attention
* add cutlass as 3rdparty dependency
* add cutlass to compilation flags
* remove all cutlass stuff
* add better error message and description and remove cutlass from compilation flags
* change credit for the approach since the code have changed
* fix typos
* correct another typo
* Add all the cuda/cublas helper functions
* remove tests using kAddTo
* only use cublasStridedBatchedGemm if CUDA >= 9.1
* add equivalent mxnet code in description of mha ops
* remove a wrong copy-paste
* add _contrib for namespace and add GPU only on description
* add warning in bwd_ignore_zero_init description, also test with fp32
* add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO
* remove std::move for clang
* remove bwd_ignore_zero_init flag
* remove bwd_ignore_zero_init in test_operator_gpu.py
* fix typo
* fix another typo
* Removed unrelated test
* Add example and documentation for multi threaded inference
* Add LICENSE
* Add get_model.py
* Add license for README
* Refactor cached op and cached op threadsafe
* Add limitation
* Add tests for naive engine
* Add latest test changes
* Thread Safety tests in NaiveEngine mode
* Thread Safety tests update
* Update thread safety tests, add unsupported use cases
* Changes to doc and refactor
* Fix todo owner, indentation and mx_float->float
* Refactor cached op code, remove num_threads arg from example
* Fix lint
* Fix warning
* Add back cython, required for unix-gpu build
* Fix for windows
* Add bulking support for thread safe cached op version
* Add support for subgraph testing
* import mxnet before calling get_backend_symbol
* Fix symbol json name
* Refactor DynamicForward
* Add comments
* Add DMLC_ATTRIBUTE_UNUSED
* Fix use_naive_run issue
* Fix lint
* Revert unittest_cpp to old test since it doesnt test thread safety
* Fix doc
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: Tao Lv <tao.a.lv@intel.com>
Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Zhennan Qin <zhennan.qin@intel.com>
2020-02-01 09:36:59 -08:00
}
}
}
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936)
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* DeviceTensor3 added, forEachFast not yet converted
* DeviceTensor3 version working
* DeviceTensor3 working
* .
* Fix for use_global_stats
* fixed bug with testing suite for double (Float64)
* python unit tests working for batchnorm
* python unit tests
* Update documentation for mxnet.initializer.Mixed (#5937)
* Update documentation for SVMOutput. (#5931)
* Update documentation for SVMOutput.
* Update doc for SVMOutput - fix formatting.
* Adding install instruction for Ubuntu-CPU-Python (#5885)
* edit ndarray API docs (#5806)
* edit docs in broadcast_reduce_op
* edit docs in broadcast_reduce_op
* minor change
* lint fix
* fix
* mx.nd.ones
* mx.nd.repeat
* mx.nd.reverse
* add example in repeat
* optimizer update
* fix nanprod
* fix optimizer_op api doc
* fix reduce_op api doc
* fix nd.ones api doc
* mx.nd.repeat doc change
* Update broadcast_reduce_op.h
* Symbol docs fixes (#5930)
* symbol docs minor formatting changes
* deepcopy, infer_shape, infer_shape_partial docs modified
* Few more small fixes
* arithmetic functions fixes
* some more modifications
* changes after review
* small change
* grad function note added
* More API Doc Edits (#5886)
* edit activation doc
* doc l2_normalization
* edit MakeLoss doc
* edit blockgrad doc
* blockgrad fileline fix
* edit MakeLoss doc cont.
* doc change 'tensor' to 'multidimensional array'
* l2normalization doc improve
* makeloss doc improve, blockgrad doc improve
* fix doc in activation, l2_normalization, make_loss
* fix minor grammar
* use .describe to avoid build failure.
* Update documentation for mxnet.image.imdecode (#5957)
* Update documentation for mxnet.image.imdecode
* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)
* Fix script by adding path to Dockerfile (#5958)
* Clean install script
* Add test for pip installations
* Remove debug statements & comments
* Make test runnable as script and from framework
* Fix path to Dockerfiles
* Putting failing cases at the end
* Update doc for Custom operator. (#5875)
* Update doc for Custom operator.
* Update doc for Custom operator.
* Fix formating in doc for Custom operator.
* Fix formating in doc for Custom operator.
* Minor change to ndarray.Custom documentation.
* Minor edit in doc for Custom operator.
* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.
* Minor formatting change for Custom operator documentation.
* For Custom operator doc, move example into ndarray_doc.py.
* Minor change in Custom operator documentation
* Improve the doc of pick + Update dmlc-core (#5946)
* Add PickParam to fix the docstring and the initial value for axis
* Update dmlc-core
* Update dmlc-core
* Image docs modified (#5973)
* imageIter doc modified
* edited imageiter
* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962)
* [KVStore] Add support for other data types (#5818)
* Fix kvstore type
* Fix lint
* Parse inputs to DataDesc
* Make module support dtype
* Fix lint
* Add default dtype in Comm
* Fix lint
* Revert rename
* [cpp-package] Add C++ basic tutorial and build instruction (#5971)
* Add C++ basic tutorial and build instruction
* Remove binaries
* Fix lint
* Avoid sign-compare
* Update documentation for mxnet.metric.np (#5977)
* Getting rid of identity (#5935)
* Activation ops (#5938)
* [Ops] Add op: 'relu'
* Add op: 'sigmoid'
* Introduce 'kernel_launch_op'
* Add tests and describe; move it to elemwise_unary_op
* Fix GPU version
* Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984)
* Correction to LSTMCell docstring (#5986)
* [Module] fix input_grads order (#5980)
* fix input_grads order + update dmlc-core
* set label to be optional
* update env_var doc (#5964)
* Adjusting make, Callback removed
* batch norm gpu testing
* Batch Norm rewrite without mshadow as well as operator gtest framework
* performance testing
* lint fixes
* use CUDNN for this test
* remove superfluous omp define
* Fix file names in comments
* build, run, clean gtest works (although a test is failing)
* CR comments
* Adjust timing tests for more strenuous sample
* Remove temp resource allocation
* rearrange source into cc and cu files
* lint fixes
* Trigger build
* Use latest mshadow
* temporarily revert channel position parameter field
* Add more tests for batchnorm
* Add more tests for batchnorm
* test_operator_gpu working for all types
* Compiles after AccReal
* Compiles after AccReal
* All tests working
* All tests working
* build, run, clean gtest works (although a test is failing)
* vc++ requires explicit int type for omp for loop
* Repair cpp-package
* signed/unsigned fixed in cuda file
* lint fixes in tests and cpp-package directories
* more lint
* use IsWriting() helper
* Fall-through for unsupported MKL shapes/types
* Fall-through for unsupported MKL shapes/types
* cleaner mkl_off approach
* Warning only whem MKL is requested
* Warning only whem MKL is requested
* lint
* ..
* python problem fixed
* python problem fixed
* Merge branch 'batchnorm' into batchnorm_pr
# Conflicts:
# src/operator/batch_norm.cc
# src/operator/batch_norm.cu
# tests/cpp/operator/batchnorm_test.cc
* lint fix
* lint fix
* lint fix
* lint fix
* lint fix
* Fix visual c++ compile problem
* .
* .
* All unit tests pass again
* lint fix
* fix strange compile errors in CUDNN batchnorm header
* FInish using flags instead of bools
* lint
* Fix timing pass count for forward pass
* Fix R script install roxygen problem
* code formatting, addition of doc strings is causing IDE to add spaces before the calls
* removed commented
* cr comments
* Change back to compilable code
* For CPU mode, store as invstd
* move testing code around a little
* lint fix
* Use AccReal in some places to avoid fp16 problems
* Fix minor invstd problem in cuda version
* remove unused scale param
* add permutation unit test, handle cudnn doesn't like 3D
* .
* lint
* .
* Remove mkl_off
* lint fix and time cudnn when enabled
2017-05-15 20:27:28 -07:00
} // namespace test
} // namespace mxnet
2018-02-26 10:09:45 -08:00
# if defined(_MSC_VER)
inline void usleep ( __int64 usec ) {
HANDLE timer ;
LARGE_INTEGER ft ;
// Convert to 100 nanosecond interval, negative value indicates relative time
2021-11-19 09:27:00 +01:00
ft . QuadPart = - ( 10 * usec ) ;
2018-02-26 10:09:45 -08:00
timer = CreateWaitableTimer ( NULL , TRUE , NULL ) ;
SetWaitableTimer ( timer , & ft , 0 , NULL , NULL , 0 ) ;
WaitForSingleObject ( timer , INFINITE ) ;
CloseHandle ( timer ) ;
}
# endif // _WIN32
2017-07-12 10:04:40 -07:00
# endif // TEST_UTIL_H_