Blame: tests/cpp/include/test_util.h - apache/mxnet

apache / mxnet UNCLAIMED

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

20813 0 0 C++

Normal View History Raw

Add license header (#7379) * add * add .py and ci * fix pylint * update 2017-08-08 16:36:23 -07:00			`/*`
			`* Licensed to the Apache Software Foundation (ASF) under one`
			`* or more contributor license agreements. See the NOTICE file`
			`* distributed with this work for additional information`
			`* regarding copyright ownership. The ASF licenses this file`
			`* to you under the Apache License, Version 2.0 (the`
			`* "License"); you may not use this file except in compliance`
			`* with the License. You may obtain a copy of the License at`
			`*`
			`* http://www.apache.org/licenses/LICENSE-2.0`
			`*`
			`* Unless required by applicable law or agreed to in writing,`
			`* software distributed under the License is distributed on an`
			`* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`* KIND, either express or implied. See the License for the`
			`* specific language governing permissions and limitations`
			`* under the License.`
			`*/`

Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`/*!`
			`* \file test_util.h`
			`* \brief unit test performance analysis functions`
			`* \author Chris Olivier`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`*/`
Refactor Stateful operator and custom op (#6928) * refactor create layer * fix * refactor custom op * fix * fix * fix * fix * fix OpState * remove superfluous infershape * fix * fix * fix lint * fix * fix * fix * Update CMakeLists.txt * delete * fix * fix scala 2017-07-12 10:04:40 -07:00			`#ifndef TEST_UTIL_H_`
			`#define TEST_UTIL_H_`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00
			`#include <gtest/gtest.h>`
			`#include <mxnet/storage.h>`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`#include <mxnet/ndarray.h>`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`#include <string>`
			`#include <vector>`
			`#include <sstream>`
Add googletest as a 3rdparty library (#9016) * [CMake] Compile with gtest * [Make] Use gtest from 3rdparty in Make build * [Clang] Fix warning * [Windows] Misc test fixes * [rebase] update mshadow... * Add googletest as submodule * googletest -> release-1.8.0 2017-12-14 17:25:26 +01:00			`#include <random>`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`#include "../../../src/ndarray/ndarray_function.h"`

Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`#if MXNET_USE_VTUNE`
			`#include <ittnotify.h>`
			`#endif`

			`namespace mxnet {`
			`namespace test {`

			`extern bool unitTestsWithCuda;`
CPU optimization for ActivationOp (#8296) * CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (#8125) * v0.12 regression: Fix registration of children for Block (#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from #8152 * Add tests from #8152 * Revert "[CMAKE] Fix windows cmake build" (#8311) * Revert "Added my code signing key (#8293)" This reverts commit 22ab185bbfde0ac2d801ec700ac4705ef0ee8daa. * Revert "[CMAKE] Fix windows cmake build (#8227)" This reverts commit 1c1c788916d672ee3cafdc4c91d7002a94a59d13. * fixed broken links. https was pointing to http for mxnet.io (#8300) * Update rnn.md (#8320) * fluent methods for missed ops (#8329) * update ps lite (#8327) * Fix unused type warning (#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (#8369) * Allow test to converge (#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (#7988) * [Perl] emulate Python zip() for Perl (#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (#8376) Fix a typo in the example readme. 2017-10-22 20:41:14 -07:00			`extern bool debug_output;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`extern bool quick_test;`
CPU optimization for ActivationOp (#8296) * CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (#8125) * v0.12 regression: Fix registration of children for Block (#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from #8152 * Add tests from #8152 * Revert "[CMAKE] Fix windows cmake build" (#8311) * Revert "Added my code signing key (#8293)" This reverts commit 22ab185bbfde0ac2d801ec700ac4705ef0ee8daa. * Revert "[CMAKE] Fix windows cmake build (#8227)" This reverts commit 1c1c788916d672ee3cafdc4c91d7002a94a59d13. * fixed broken links. https was pointing to http for mxnet.io (#8300) * Update rnn.md (#8320) * fluent methods for missed ops (#8329) * update ps lite (#8327) * Fix unused type warning (#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (#8369) * Allow test to converge (#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (#7988) * [Perl] emulate Python zip() for Perl (#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (#8376) Fix a typo in the example readme. 2017-10-22 20:41:14 -07:00			`extern bool performance_run;`
Kernel operator tuning (#8686) * Refreshed branch bc_tune * local-build openmp as static * trigger * Somehow broadcast found its way back in, removed again * Trigger rebuild 2017-11-21 06:49:51 -08:00			`extern bool csv;`
Multithreaded Inference Support (#16654) * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Add CI changes * Add stage Fix indentation * Fix lint * Change to DEFAULT for C API * Fix mxnet_unit_tests path * export correct LD_LIBRARY_PATH * Add cpp include dirs * Build test with USE_CPP_PACKAGE * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Merge * change mkldnn lib name * Add static_alloc, static_Shape support * Address review comments * Make GetCachedOpThreadSafeState similar to cached_op * Address review comments: comments for locking strategy * multithreaded inference tutorial * [Estimator] handle composite metrics in estimator (#16676) * handle composite metrics in estimator * fix composite metric case in handlers * remove unused import * [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678) * refactor estimator to allow overriding evaluate/fit of a batch * add doc to explain call structure and how to override * fix and doc * Pointwise fusion for GPU (#15167) * Beginning of RTC of pointwise ops * Code generation from the given JSON * add initial simple_partition_pass and use it for pointwise fusion * fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code * Fixes * Adding support for attribute inference for backward nodes when fusing * keep proper input ordering for fused Op * instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph * Fuse backward * fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch * excluse forward node fusion during the fusion of the nodes in the backward graph * Dealing with fused backward nodes inferattr * use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph * Adding support for other reqs in codegen * Fix * Cleaning * Change the TVM submodule * More cleaning * Making linter happy * Do fusion only if default context is GPU * Fixes for tests Add powerscalar and rpowerscalar, fix return type of zero and one Cleaning, fixing lint Go back to proper TVM submodule * Fix the TVM commit * Fix lint * Guard fusion with MXNET_USE_CUDA * Fix * Fix clang-tidy * Add erf and erfinv backward * Gluon support for fusion * Cleaning * Cleaning and allow shape/type change in FusedOp * Fixing Gluon bugs * Fixing after rebase * Fixing race condition and guarding against races when using NVRTC * Cleaning and renaming FusedOp to _FusedOp * Going easy on Windows compiler * Disable fusion on Windows for now * Refactor InferAttr and InferShapeAttr * Added slice and half2 support to FusedOp * Fix lint errors * Added multiple types support for vector loading/storing * add slice fusion when it's at the beginning of subgraphs * Removed constant ndim assumption in fused op * Fix memory alignment issue in slice for FusedOp * Fixes * Fix lint errors * Do not include cuda_fp16.h * Refactor fused op op lists * Make linter happy * Changes from review * Fixes after rebase * Expand FusedOp support for slice * Fix for fp16 _zeros and _ones * Fix * Moving aux functions to unnamed namespace and detail namespace -> fusion namespace * Disabling fusion if it alters topological order of inputs * Print code only when env variable is set * Fix * Fix lint and 2 tests that specify the same names for multiple inputs * Fixes from review and disabling fusion of slice with non-default step * Add amp_cast to fusion, fixes * Add amp_multicast and its backward to the list of support ops * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Make clearer comment * Adding punctuation and capitalization to \brief descriptions * Fix * Fix * Add backward_cast to fusion * Adding unittests for fusion. Fix for erfinv_grad * Adding slice ops and add_n to tests * Fixes from review * Setting inplace option * Fix lint * Storing double in half * Retrigger CI * Slight relaxing of the relative tolerance in the test * Move the env variable check to the end * Fix a race condition between InferShape and scheduled Forward * Fix flakey test_fusion test involving fp32 erfinv op. * Fix from review * Added broadcast_like and slice_like to fused op * Minor fix and cleanup * Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like * Added axes support to slice_like * Added axis support to broadcast_like * Add fast_load_slice function to fused op code * Added runtime switch for choosing fast and slow slice kernel * Fix lint and warning * Going easy on Windows compiler (again) * Fix slice_like * Debug broadcast_like fusion * Fix lint * Fix lint * Trigger CI * Get rid of the initializer list * Fix backward calls with different gradient type * avoid cycle when adding node specific for inputs of subgraph for pointwise fusion * Fix lint * Add namespace to the fusion implementations * Set launch bounds on the fused kernel * Fix NumPy tests * Test showcasing an issue fixed in PR #16553 * Cast scalarts to FP32 and perform (a1.0/b) instead of (a/b) Fix lint errors Fix lint Fix a bug in cycle detection for inputs only op in pointwise fusion * Add comments to simple_partition_pass.h file * fix install dir (#16690) * [numpy] add numpy operator : append (#16564) * add operator : append ; fix op concatenate when axis = None * pylint disable remove mistake disable pylint * Initializer.__eq__ (#16680) * fix binary dependencies in CD and nightly (#16693) * [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688) * add mxnet mkldnn cmake instruction * imporve doc * OMP->OpenMP * Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697) This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa. * [Estimator] refactor estimator and clarify docs (#16694) * refactor estimator and clarify docs * fix info message and test * clean up after releasing logging handler * Eliminate common expressions (#15657) * Eliminate common expressions from a graph * Guarding against optimizing out stateful ops and ops that require resource * Fix lint * Added THasDeterministicOutput to multiple ops * DDebug eliminate common expr * Added test * Expose get_optimized_symbol * Fix * Fix 2 * Add doc to the Python call * Add env var MXNET_ELIMINATE_COMMON_EXPR, default true * Add comments, improve readability of eliminate_common_expr_pass.cc * Expand testing * Lower priority of THasDeterministicOutput attr for equal Node test * Change mx.gpu() to mx.cpu() in tests * Skip CSE test on Windows (as env variable setting during test does not work there) * Add missing import sys * Add missing import logging * Backport of #16711, #16737, #16408 to 1.6 branch (#16763) * support mixed-precision true_divide (#16711) * [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737) * use dim_t instead of int * fix same issue in pooling * rebase code * trigger CI * Add MXNet Ops for fast multihead attention (#16408) * add MXNet Ops for fast multihead attention * add cutlass as 3rdparty dependency * add cutlass to compilation flags * remove all cutlass stuff * add better error message and description and remove cutlass from compilation flags * change credit for the approach since the code have changed * fix typos * correct another typo * Add all the cuda/cublas helper functions * remove tests using kAddTo * only use cublasStridedBatchedGemm if CUDA >= 9.1 * add equivalent mxnet code in description of mha ops * remove a wrong copy-paste * add _contrib for namespace and add GPU only on description * add warning in bwd_ignore_zero_init description, also test with fp32 * add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO * remove std::move for clang * remove bwd_ignore_zero_init flag * remove bwd_ignore_zero_init in test_operator_gpu.py * fix typo * fix another typo * Removed unrelated test * Add example and documentation for multi threaded inference * Add LICENSE * Add get_model.py * Add license for README * Refactor cached op and cached op threadsafe * Add limitation * Add tests for naive engine * Add latest test changes * Thread Safety tests in NaiveEngine mode * Thread Safety tests update * Update thread safety tests, add unsupported use cases * Changes to doc and refactor * Fix todo owner, indentation and mx_float->float * Refactor cached op code, remove num_threads arg from example * Fix lint * Fix warning * Add back cython, required for unix-gpu build * Fix for windows * Add bulking support for thread safe cached op version * Add support for subgraph testing * import mxnet before calling get_backend_symbol * Fix symbol json name * Refactor DynamicForward * Add comments * Add DMLC_ATTRIBUTE_UNUSED * Fix use_naive_run issue * Fix lint * Revert unittest_cpp to old test since it doesnt test thread safety * Fix doc Co-authored-by: Sheng Zha <szha@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com> Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Zhennan Qin <zhennan.qin@intel.com> 2020-02-01 09:36:59 -08:00			`extern bool thread_safety_force_cpu;`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename DType>`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`inline size_t shapeMemorySize(const mxnet::TShape& shape) {`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`return shape.Size() * sizeof(DType);`
			`}`

			`class BlobMemory {`
			`public:`
			`explicit inline BlobMemory(const bool isGPU) : isGPU_(isGPU) {`
			`this->handle_.dptr = nullptr;`
			`}`
			`inline ~BlobMemory() {`
			`Free();`
			`}`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`void* Alloc(const size_t size) {`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`CHECK_GT(size, 0U); // You've probably made a mistake`
			`mxnet::Context context = isGPU_ ? mxnet::Context::GPU(0) : mxnet::Context{};`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`Storage* storage = mxnet::Storage::Get();`
			`handle_ = storage->Alloc(size, context);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`return handle_.dptr;`
			`}`
			`void Free() {`
Tidy up storage allocation and deallocation (#14480) * free memory when dptr is not nullptr * skip memory allocation when handle size is 0 * update comments * update Alloc in naive storage manager * address comments * add unit test for size 0 allocation 2019-03-27 19:40:30 -07:00			`mxnet::Storage::Get()->DirectFree(handle_);`
			`handle_.dptr = nullptr;`
			`handle_.size = 0;`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`}`
			`size_t Size() const {`
			`return handle_.size;`
			`}`

			`private:`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`const bool isGPU_;`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`Storage::Handle handle_;`
			`};`

			`class StandaloneBlob : public TBlob {`
			`public:`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`inline StandaloneBlob(const mxnet::TShape& shape, const bool isGPU, const int dtype)`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`: TBlob(nullptr, shape, isGPU ? gpu::kDevMask : cpu::kDevMask, dtype),`
			`memory_(std::make_shared<BlobMemory>(isGPU)) {`
			`MSHADOW_TYPE_SWITCH(`
			`dtype, DType, { this->dptr_ = memory_->Alloc(shapeMemorySize<DType>(shape)); });`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`}`
			`inline ~StandaloneBlob() {`
			`this->dptr_ = nullptr;`
			`}`
			`inline size_t MemorySize() const {`
			`return memory_->Size();`
			`}`

			`private:`
			`/! \brief Locally allocated memory block for this blob /`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`std::shared_ptr<BlobMemory> memory_;`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`};`

Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`/*!`
			`* \brief Access a TBlob's data on the CPU within the scope of this object`
			`* Overloaded () operator returns the CPU-bound TBlob`
			`* RAII will copy the data back to the GPU (if it was a GPU blob)`
			`*/`
			`class CAccessAsCPU {`
			`public:`
			`CAccessAsCPU(const RunContext& run_ctx, const TBlob& src, bool copy_back_result = true)`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`: run_ctx_(run_ctx), src_(src), copy_back_result_(copy_back_result) {`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`#if MXNET_USE_CUDA`
			`if (run_ctx.ctx.dev_type == Context::kCPU) {`
			`blob_ = src;`
			`} else {`
			`Context cpu_ctx, gpu_ctx = run_ctx.ctx;`
			`cpu_ctx.dev_type = Context::kCPU;`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`cpu_ctx.dev_id = 0;`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`NDArray on_cpu(src.shape_, cpu_ctx, false, src_.type_flag_);`
			`on_cpu.CheckAndAlloc();`
			`blob_ = on_cpu.data();`
			`run_ctx.get_stream<gpu>()->Wait();`
			`mxnet::ndarray::Copy<gpu, cpu>(src, &blob_, cpu_ctx, gpu_ctx, run_ctx);`
			`run_ctx.get_stream<gpu>()->Wait();`
			`on_cpu_ = on_cpu;`
			`}`
			`#else`
			`blob_ = src;`
			`#endif`
			`}`
			`~CAccessAsCPU() {`
			`#if MXNET_USE_CUDA`
			`if (copy_back_result_) {`
			`// Copy back from GPU to CPU`
			`if (run_ctx_.ctx.dev_type == Context::kGPU) {`
			`Context cpu_ctx, gpu_ctx = run_ctx_.ctx;`
			`cpu_ctx.dev_type = Context::kCPU;`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`cpu_ctx.dev_id = 0;`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`run_ctx_.get_stream<gpu>()->Wait();`
			`mxnet::ndarray::Copy<cpu, gpu>(blob_, &src_, gpu_ctx, cpu_ctx, run_ctx_);`
			`run_ctx_.get_stream<gpu>()->Wait();`
			`}`
			`}`
			`#endif`
			`}`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`inline const TBlob& operator()() const {`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`return blob_;`
			`}`

			`private:`
			`const RunContext run_ctx_;`
			`TBlob src_;`
			`const bool copy_back_result_;`
			`NDArray on_cpu_;`
			`TBlob blob_;`
			`};`

			`/*!`
			`* \brief Access data blob as if on the CPU via a callback`
			`* \tparam Type of callback Function to call with CPU-data NDArray`
			`* \param src Source NDArray (on GPU or CPU)`
			`* \param run_ctx Run context`
			`* \param cb Callback Function to call with CPU-data NDArray`
			`*/`
			`template <typename CallbackFunction>`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`inline void AccessAsCPU(const NDArray& src, const RunContext& run_ctx, CallbackFunction cb) {`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`#if MXNET_USE_CUDA`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`if (src.ctx().dev_type == Context::kCPU) {`
			`cb(src);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`} else {`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`Context cpu_ctx, gpu_ctx = src.ctx();`
			`cpu_ctx.dev_type = Context::kCPU;`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`cpu_ctx.dev_id = 0;`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`NDArray on_cpu(src.shape(), cpu_ctx, false, src.dtype());`
			`on_cpu.CheckAndAlloc();`
			`TBlob tmp1 = on_cpu.data();`
			`run_ctx.get_stream<gpu>()->Wait();`
			`mxnet::ndarray::Copy<gpu, cpu>(src.data(), &tmp1, cpu_ctx, gpu_ctx, run_ctx);`
			`run_ctx.get_stream<gpu>()->Wait();`
			`cb(on_cpu);`
			`TBlob tmp2 = src.data();`
			`mxnet::ndarray::Copy<cpu, gpu>(on_cpu.data(), &tmp2, gpu_ctx, cpu_ctx, run_ctx);`
			`run_ctx.get_stream<gpu>()->Wait();`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`}`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`#else`
			`cb(src);`
			`#endif`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`}`

Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`/*!`
			`* \brief Access data blob as if on the CPU via a callback`
			`* \tparam Type of callback Function to call with CPU-data NDArray`
			`* \param src Source TBlob (on GPU or CPU)`
			`* \param run_ctx Run context`
			`* \param cb Callback Function to call with CPU-data TBlob`
			`*/`
			`template <typename CallbackFunction>`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`inline void AccessAsCPU(const TBlob& src, const RunContext& run_ctx, CallbackFunction cb) {`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`#if MXNET_USE_CUDA`
			`if (run_ctx.ctx.dev_type == Context::kCPU) {`
			`cb(src);`
			`} else {`
			`cb(CAccessAsCPU(run_ctx, src, true)());`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`#else`
			`cb(src);`
			`#endif`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`

Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`constexpr const size_t MPRINT_PRECISION = 5;`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename DType>`
			`inline void fill(const RunContext& run_ctx, const TBlob& _blob, const DType val) {`
CI: Test clang10 cpu & gpu builds with -WError (#17830) * Fix Wunused-variable * Fix Wreturn-std-move * Fix Wunused-const-variable * Fix Winconsistent-missing-override * Fix Wdelete-non-abstract-non-virtual-dtor * Fix Wrange-loop-construct * Disable Wpass-failed=transform-warning warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering * Fix Wimplicit-int-float-conversion 'float' changes value from 2147483647 to 2147483648 * Fix Wunused-lambda-capture * Fix Wundefined-var-template * cuda: --expt-relaxed-constexpr warning: calling a constexpr __host__ function from a __host__ __device__ function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this. * Fix Wrange-loop-construct avoiding extra copies * Fix Wunused-private-field * Fix Wwritable-strings * Enable Clang10 -WError checking on CI * Fix -WError with mkldnn -Wliteral-conversion, -Wabsolute-value, -Wunused-private-field, -Wimplicit-int-float-conversion * Fix shuffle_op.cc * Fix use of old binutils * Print traceback on exception in OpWrapperGenerator.py * USE_CPP_PACKAGE=OFF for gpu clang10 werror build 2020-03-17 21:36:50 -07:00			`AccessAsCPU(_blob, run_ctx, [val](const TBlob& blob) {`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`MSHADOW_TYPE_SWITCH(blob.type_flag_, DTypeX, {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`DTypeX* p1 = blob.dptr<DTypeX>();`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`for (size_t i = 0, n = blob.Size(); i < n; ++i) {`
			`*p1++ = val;`
			`}`
			`});`
			`});`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename DType>`
			`inline void try_fill(const RunContext& run_ctx, const TBlob* blob, const DType val) {`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`if (blob) {`
			`fill(run_ctx, *blob, val);`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
			`}`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename DType, typename Stream>`
			`inline void dump(Stream* os, const TBlob& blob, const char* suffix = "f") {`
			`DType* p1 = blob.dptr<DType>();`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`for (size_t i = 0, n = blob.Size(); i < n; ++i) {`
			`if (i) {`
			`*os << ", ";`
			`}`
			`const DType val = *p1++;`

			`std::stringstream stream;`
			`stream << val;`
			`std::string ss = stream.str();`
			`if (suffix && *suffix == 'f') {`
			`if (std::find(ss.begin(), ss.end(), '.') == ss.end()) {`
			`ss += ".0";`
			`}`
			`}`
			`*os << ss << suffix;`
			`}`
			`}`

			`/! \brief Return reference to data at position indexes /`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`inline index_t getMult(const mxnet::TShape& shape, const index_t axis) {`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`return axis < shape.ndim() ? shape[axis] : 1;`
			`}`

			`/! \brief offset, given indices such as bn, channel, depth, row, column /`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`inline index_t offset(const mxnet::TShape& shape, const std::vector<size_t>& indices) {`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`const size_t dim = shape.ndim();`
			`CHECK_LE(indices.size(), dim);`
			`size_t offset = 0;`
			`for (size_t i = 0; i < dim; ++i) {`
			`offset *= shape[i];`
			`if (indices.size() > i) {`
			`CHECK_LT(indices[i], shape[i]);`
			`offset += indices[i];`
			`}`
			`}`
			`return offset;`
			`}`

			`/! \brief Return reference to data at position indexes /`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename DType>`
			`inline const DType& data_at(const TBlob* blob, const std::vector<size_t>& indices) {`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`return blob->dptr<DType>()[offset(blob->shape_, indices)];`
			`}`

			`/! \brief Set data at position indexes /`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename DType>`
			`inline DType& data_ref(const TBlob* blob, const std::vector<size_t>& indices) {`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`return blob->dptr<DType>()[offset(blob->shape_, indices)];`
			`}`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`inline std::string repeatedStr(const char* s,`
			`const signed int count,`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`const bool trailSpace = false) {`
			`if (count <= 0) {`
			`return std::string();`
			`} else if (count == 1) {`
			`std::stringstream str;`
			`str << s << " ";`
			`return str.str();`
			`} else {`
			`std::stringstream str;`
			`for (int x = 0; x < count; ++x) {`
			`str << s;`
			`}`
			`if (trailSpace) {`
			`str << " ";`
			`}`
			`return str.str();`
			`}`
			`}`

Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`/! \brief Pretty print a shape with optional label /`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename StreamType>`
			`inline StreamType& print_shape(StreamType* _os,`
			`const std::string& label,`
			`const mxnet::TShape& shape,`
			`const bool add_endl = true) {`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`if (!label.empty()) {`
			`*_os << label << ": ";`
			`}`
			`*_os << "(";`
			`for (size_t i = 0, n = shape.ndim(); i < n; ++i) {`
			`if (i) {`
			`*_os << ", ";`
			`}`
			`*_os << shape[i];`
			`}`
			`*_os << ")";`
			`if (add_endl) {`
			`*_os << std::endl;`
			`} else {`
			`*_os << " ";`
			`}`
			`return *_os << std::flush;`
			`}`

Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`/! \brief Pretty print a 1D, 2D, or 3D blob /`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename DType, typename StreamType>`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`inline StreamType& print_blob_(const RunContext& ctx,`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`StreamType* _os,`
			`const TBlob& blob,`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`const bool doChannels = true,`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`const bool doBatches = true,`
			`const bool add_endl = true) {`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`#if MXNET_USE_CUDA`
			`if (blob.dev_mask() == gpu::kDevMask) {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`return print_blob_<DType>(`
			`ctx, _os, CAccessAsCPU(ctx, blob, false)(), doChannels, doBatches, add_endl);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`}`
			`#endif // MXNET_USE_CUDA`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`StreamType& os = *_os;`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`const size_t dim = static_cast<size_t>(blob.ndim());`

			`if (dim == 1) {`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`// probably a 1d tensor (mshadow::Tensor is deprecated)`
[numpy] Support zero-dim and zero-size tensors in MXNet (#14661) * [numpy] Shape support scalar tensor (#14315) * Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set * [Numpy] Change semantics of ndim for operators in `src/operator/contrib` (#14409) * Initial commit * Address comments * [WIP] Use new shape definition (#14453) * Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func * [numpy] Fix unit tests after introducing numpy compatible shapes (#14487) * Fix infer shape rnn * Fix boolean mask and custom op unit tests * Fix multi proposal * Fix diag * Add global switch for backward compatibility and fix infer shape bugs * Fix slice op infer shape * Fix rnn infer shape * Add util funcs for ndim_is_known and dim_size_is_known * Revert rnn_cell.py * Fix a bug to pass the test in test_contrib_rnn (#14520) * fix. * remove type conversion. * remove type cast. * [numpy] Fix test_dynamic_shape.test_dynamic_shape (#14538) * Initial commit * Address comments from Jun * [numpy] Fix numpy import in python2 (#14537) * Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2 * fix concat and slice (#14549) * fix R-package (#14536) * Fix cpp package build after using new shape definition (#14554) * Fix pooling_v1 and deformable_convolution param initialization (#14577) * Fix pooling_v1 param initialization * Fix deformable_convolution param initialization * [Numpy] Misc fix (#14612) * [Numpy] Misc Fix * fix build * !shape_is_none => shape_is_known * Address comments * Fix * [Numpy] fix test_operator_gpu.test_upsampling_bilinear_with_type (#14557) * Fix test_operator_gpu.test_upsampling_bilinear_with_type * Address comments * [Numpy] Java/Scala modification (#14625) * modify jni to support 0 dim/shape * fix transpose axes default value * fix shape index bug (#14630) * fix jni lint (#14634) * [numpy] Fix numpy branch failing tests in CI (#14639) * Remove numpy namespaces for operator registration * Fix bug when shape is compeltely unknown * Fix singed/unsigned compare warning * Fix CI * Fix pylint * Avoid launching gpu kernels for zero-size output tensors * Fix test_ndarray * Fix binary broadcast with zero-size tensors * Better error message for infer shape failure in imperative * Fix TShape constructor ambiguity on certain platforms * Fix mkldnn build failure * Fix build failure in gpu and cpp test * Fix gpu cpp test build with mkldnn * Fix mkldnn cpp test * Fix concatenating zero-size tensors * Avoid letting mkldnn handle zero-size tensors in concat * Fix quantized_concat infer shape * Try to fix perl c api * fix invalid ndarray dispose (#14657) * swig fixes for the changes in c_api.h (#14655) * Rename np_comp to np_compat for readability * Fix import error * Keep old c apis unchanged * Fix lint * Rebase and fix build * Fix R build failure * Fix Perl build failure * Rebase with master * Address cr comments * Use just one scope to represent numpy compatibility * Add code comment to NumpyScope object in Scala * Add use_np_compat decorator * Fix pylint 2019-04-16 10:00:54 -07:00			`TBlob changed(blob.dptr<DType>(), mxnet::TShape(3, -1), blob.dev_mask(), blob.dev_id());`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`changed.shape_[0] = 1;`
			`changed.shape_[1] = 1;`
			`changed.shape_[2] = blob.shape_[0];`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`return print_blob_<DType>(ctx, &os, changed, false, false, add_endl);`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`} else if (dim == 2) {`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`// probably a 2d tensor (mshadow::Tensor is deprecated)`
[numpy] Support zero-dim and zero-size tensors in MXNet (#14661) * [numpy] Shape support scalar tensor (#14315) * Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set * [Numpy] Change semantics of ndim for operators in `src/operator/contrib` (#14409) * Initial commit * Address comments * [WIP] Use new shape definition (#14453) * Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func * [numpy] Fix unit tests after introducing numpy compatible shapes (#14487) * Fix infer shape rnn * Fix boolean mask and custom op unit tests * Fix multi proposal * Fix diag * Add global switch for backward compatibility and fix infer shape bugs * Fix slice op infer shape * Fix rnn infer shape * Add util funcs for ndim_is_known and dim_size_is_known * Revert rnn_cell.py * Fix a bug to pass the test in test_contrib_rnn (#14520) * fix. * remove type conversion. * remove type cast. * [numpy] Fix test_dynamic_shape.test_dynamic_shape (#14538) * Initial commit * Address comments from Jun * [numpy] Fix numpy import in python2 (#14537) * Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2 * fix concat and slice (#14549) * fix R-package (#14536) * Fix cpp package build after using new shape definition (#14554) * Fix pooling_v1 and deformable_convolution param initialization (#14577) * Fix pooling_v1 param initialization * Fix deformable_convolution param initialization * [Numpy] Misc fix (#14612) * [Numpy] Misc Fix * fix build * !shape_is_none => shape_is_known * Address comments * Fix * [Numpy] fix test_operator_gpu.test_upsampling_bilinear_with_type (#14557) * Fix test_operator_gpu.test_upsampling_bilinear_with_type * Address comments * [Numpy] Java/Scala modification (#14625) * modify jni to support 0 dim/shape * fix transpose axes default value * fix shape index bug (#14630) * fix jni lint (#14634) * [numpy] Fix numpy branch failing tests in CI (#14639) * Remove numpy namespaces for operator registration * Fix bug when shape is compeltely unknown * Fix singed/unsigned compare warning * Fix CI * Fix pylint * Avoid launching gpu kernels for zero-size output tensors * Fix test_ndarray * Fix binary broadcast with zero-size tensors * Better error message for infer shape failure in imperative * Fix TShape constructor ambiguity on certain platforms * Fix mkldnn build failure * Fix build failure in gpu and cpp test * Fix gpu cpp test build with mkldnn * Fix mkldnn cpp test * Fix concatenating zero-size tensors * Avoid letting mkldnn handle zero-size tensors in concat * Fix quantized_concat infer shape * Try to fix perl c api * fix invalid ndarray dispose (#14657) * swig fixes for the changes in c_api.h (#14655) * Rename np_comp to np_compat for readability * Fix import error * Keep old c apis unchanged * Fix lint * Rebase and fix build * Fix R build failure * Fix Perl build failure * Rebase with master * Address cr comments * Use just one scope to represent numpy compatibility * Add code comment to NumpyScope object in Scala * Add use_np_compat decorator * Fix pylint 2019-04-16 10:00:54 -07:00			`TBlob changed(blob.dptr<DType>(), mxnet::TShape(4, -1), blob.dev_mask(), blob.dev_id());`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`changed.shape_[0] = 1;`
			`changed.shape_[1] = 1;`
			`changed.shape_[2] = blob.shape_[0];`
			`changed.shape_[3] = blob.shape_[1];`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`return print_blob_<DType>(ctx, &os, changed, false, false, add_endl);`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
			`CHECK_GE(dim, 3U) << "Invalid dimension zero (0)";`

			`const size_t batchSize = blob.size(0);`

			`size_t channels = 1;`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`size_t depth = 1;`
			`size_t height = 1;`
			`size_t width = 1;`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`if (dim > 1) {`
			`channels = blob.size(1);`
			`if (dim > 2) {`
			`if (dim == 3) {`
			`width = blob.size(2);`
			`} else if (dim == 4) {`
			`height = blob.size(2);`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`width = blob.size(3);`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`} else {`
			`depth = blob.size(2);`
			`if (dim > 3) {`
			`height = blob.size(3);`
			`if (dim > 4) {`
			`width = blob.size(4);`
			`}`
			`}`
			`}`
			`}`
			`}`

			`for (size_t r = 0; r < height; ++r) {`
			`for (size_t thisBatch = 0; thisBatch < batchSize; ++thisBatch) {`
			`if (doBatches) {`
			`std::stringstream ss;`
			`if (doBatches && !thisBatch) {`
			`os << "\|";`
			`}`
			`ss << "N" << thisBatch << "\| ";`
			`const std::string nns = ss.str();`
			`if (!r) {`
			`os << nns;`
			`} else {`
			`os << repeatedStr(" ", nns.size());`
			`}`
			`}`
			`for (size_t thisChannel = 0; thisChannel < channels; ++thisChannel) {`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`os << "[";`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`for (size_t c = 0; c < width; ++c) {`
			`if (c) {`
			`os << ", ";`
			`}`
			`for (size_t dd = 0; dd < depth; ++dd) {`
			`DType val;`
			`switch (dim) {`
			`case 3:`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`val = data_at<DType>(&blob, {thisBatch, thisChannel, c});`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`break;`
			`case 4:`
			`val = data_at<DType>(&blob, {thisBatch, thisChannel, r, c});`
			`break;`
			`case 5:`
			`val = data_at<DType>(&blob, {thisBatch, thisChannel, dd, r, c});`
			`break;`
			`default:`
			`LOG(FATAL) << "Unsupported blob dimension" << dim;`
			`val = DType(0);`
			`break;`
			`}`
			`os << repeatedStr("(", dd);`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`os << std::fixed << std::setw(7) << std::setprecision(MPRINT_PRECISION) << std::right`
			`<< val << " ";`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`os << repeatedStr(")", dd, true);`
			`}`
			`}`
			`os << "] ";`
			`if (!doChannels) {`
			`break;`
			`}`
			`}`
			`if (!doBatches) {`
			`break;`
			`} else {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`os << " \|" << std::flush;`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
			`}`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`if (r < height - 1) {`
			`os << std::endl;`
			`}`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`if (!height) {`
			`os << "[]";`
			`if (add_endl) {`
			`os << std::endl;`
			`}`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`} else if (!add_endl) {`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`os << " ";`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`} else {`
			`os << std::endl;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`}`
			`os << std::flush;`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`return os;`
			`}`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename StreamType>`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`inline StreamType& print(const RunContext& ctx,`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`StreamType* _os,`
			`const TBlob& blob,`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`const bool doChannels = true,`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`const bool doBatches = true,`
			`const bool add_endl = true) {`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`MSHADOW_TYPE_SWITCH(blob.type_flag_, DType, {`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`print_blob_<DType>(ctx, _os, blob, doChannels, doBatches, add_endl);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`});`
			`return *_os;`
			`}`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename StreamType>`
			`inline StreamType& print(const RunContext& ctx,`
			`StreamType* _os,`
			`const std::string& label,`
			`const TBlob& blob,`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`const bool doChannels = true,`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`bool doBatches = true,`
			`const bool add_endl = true) {`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`if (!label.empty()) {`
			`*_os << label << ": ";`
			`}`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`return print(ctx, _os, blob, doChannels, doBatches, add_endl);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`}`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename StreamType>`
			`inline StreamType& print(const RunContext& ctx,`
			`StreamType* _os,`
			`const std::string& label,`
			`const NDArray& arr) {`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`if (!label.empty()) {`
			`*_os << label << ": ";`
			`}`
			`switch (arr.storage_type()) {`
			`case kRowSparseStorage: {`
			`// data`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`const mxnet::TShape& shape = arr.shape();`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`print_shape(_os, "[row_sparse] main shape", shape, false);`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`const mxnet::TShape& storage_shape = arr.storage_shape();`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`const bool is_one_row = storage_shape[0] < 2;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`print_shape(_os, "storage shape", storage_shape, false);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`print(ctx, _os, arr.data(), true, true, !is_one_row);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00
			`// indices`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`const mxnet::TShape& indices_shape = arr.aux_shape(rowsparse::kIdx);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`print_shape(_os, "indices shape", indices_shape, false);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`print(ctx, _os, arr.aux_data(rowsparse::kIdx), true, true, false) << std::endl;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`break;`
			`}`
			`case kCSRStorage: {`
			`// data`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`const mxnet::TShape& shape = arr.shape();`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`print_shape(_os, "[CSR] main shape", shape, false);`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`const mxnet::TShape& storage_shape = arr.storage_shape();`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`const bool is_one_row = storage_shape[0] < 2;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`print_shape(_os, "storage shape", storage_shape, false);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`print(ctx, _os, arr.data(), true, true, !is_one_row);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00
			`// row ptrs`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`const mxnet::TShape& ind_ptr_shape = arr.aux_shape(csr::kIndPtr);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`print_shape(_os, "row ptrs shape", ind_ptr_shape, false);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`print(ctx, _os, arr.aux_data(csr::kIndPtr), true, true, false) << std::endl;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00
			`// col indices`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`const mxnet::TShape& indices_shape = arr.aux_shape(csr::kIdx);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`print_shape(_os, "col indices shape", indices_shape, false);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`print(ctx, _os, arr.aux_data(csr::kIdx), true, true, false) << std::endl;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00
			`break;`
			`}`
			`case kDefaultStorage: {`
			`// data`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`const mxnet::TShape& shape = arr.shape();`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`const bool is_one_row = shape[0] < 2;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`print_shape(_os, "[dense] main shape", shape, !is_one_row);`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`print(ctx, _os, arr.data(), true, true, !is_one_row) << std::endl;`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`break;`
			`}`
			`default:`
			`CHECK(false) << "Unsupported storage type:" << arr.storage_type();`
			`break;`
			`}`
			`return *_os << std::flush;`
			`}`

Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`inline void print(const RunContext& ctx,`
			`const std::string& label,`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`const std::string& var,`
			`const std::vector<NDArray>& arrays) {`
			`std::cout << label << std::endl;`
			`for (size_t x = 0, n = arrays.size(); x < n; ++x) {`
			`std::stringstream ss;`
			`ss << var << "[" << x << "]";`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`test::print(ctx, &std::cout, ss.str(), arrays[x]);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`}`
			`}`

Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`inline void print(const RunContext& ctx,`
			`const std::string& label,`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`const std::string& var,`
			`const std::vector<TBlob>& arrays) {`
			`std::cout << label << std::endl;`
			`for (size_t x = 0, n = arrays.size(); x < n; ++x) {`
			`std::stringstream ss;`
			`ss << var << "[" << x << "]";`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`test::print(ctx, &std::cout, ss.str(), arrays[x], true, true, false);`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`}`
			`}`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`inline std::string demangle(const char* name) {`
Add googletest as a 3rdparty library (#9016) * [CMake] Compile with gtest * [Make] Use gtest from 3rdparty in Make build * [Clang] Fix warning * [Windows] Misc test fixes * [rebase] update mshadow... * Add googletest as submodule * googletest -> release-1.8.0 2017-12-14 17:25:26 +01:00			`#if defined(__GLIBCXX__) \|\| defined(_LIBCPP_VERSION)`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`int status = -4; // some arbitrary value to eliminate the compiler warning`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`std::unique_ptr<char, void ()(void)> res{abi::__cxa_demangle(name, nullptr, nullptr, &status),`
			`&std::free};`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`return status ? name : res.get();`
Add googletest as a 3rdparty library (#9016) * [CMake] Compile with gtest * [Make] Use gtest from 3rdparty in Make build * [Clang] Fix warning * [Windows] Misc test fixes * [rebase] update mshadow... * Add googletest as submodule * googletest -> release-1.8.0 2017-12-14 17:25:26 +01:00			`#else`
			`return name;`
			`#endif`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`}`

[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename T>`
			`inline std::string type_name() {`
			`return demangle(typeid(T).name());`
			`}`
Engine reserves cores from OMP. Set some defaults for dynamic and recursion (#8553) * Engine reserves cores from OMP. Set some defaults for dynamic and recursion unless environment variables are set. * Pull some generic unit testing stuff from tuner branch Also, something with gperftools got missed in a CMakeLists.txt merge at some point. * lint 2017-11-06 21:43:38 -08:00
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`#define PRINT_NDARRAYS(__ctx$, __var) test::print(__ctx$, __FUNCTION__, #__var, __var)`
			`#define PRINT_OP_AND_ARRAYS(__ctx$, __op, __var) \`
			`test::print(__ctx$, \`
			`__FUNCTION__, \`
			`static_cast<std::stringstream*>( \`
			`&(std::stringstream() << #__var << "<" << type_name<__op>() << ">")) \`
			`->str(), \`
			`__var)`
Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) Refactor AdaGrad optimizer to support sparse tensors + unary and binary refactoring for new infer storage type logic (#7903) 2017-10-15 13:34:21 -07:00			`#define PRINT_OP2_AND_ARRAYS(__ctx$, __op1, __op2, __var) test::print(__ctx$, __FUNCTION__, \`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00			`static_cast<std::stringstream *>(&(std::stringstream() << #__var << \`
Engine reserves cores from OMP. Set some defaults for dynamic and recursion (#8553) * Engine reserves cores from OMP. Set some defaults for dynamic and recursion unless environment variables are set. * Pull some generic unit testing stuff from tuner branch Also, something with gperftools got missed in a CMakeLists.txt merge at some point. * lint 2017-11-06 21:43:38 -08:00			`"<" << type_name<__op1>().name()) << ", " \`
			`<< type_name<__op2>() << ">"))->str(), __var)`
Sparse operators for unary and binary elemwise NDArray operators. (#7577) * Elemwise operators * Further merge fixups * Further merge fixups * add flatten option to fc (#7548) * add last_axis option to fc * update per comments * clean up * Updating the LICENSE and NOTICE Files (#7563) * add resnet50_v2 pretrained (#7564) * Set dmlc/master nnvm commit point * Fix submodule points * . * remove superfluous template parameter * Remove Binary and BinaryScalar from ElemwiseBinary and BinaryScalar class member names * Change some naming conventions from Launch to Compute * More naming normalization * add license header * Allocation and optimization * lint fixes * lint * Change LaunchXXX to ComputeXXX in member functions * amalgamation doesn't like this MXNET_DESCRIBE with R prefix (raw string). In addition, its usage wasn't consistent anyway (only 2 of 5 places used it). * MXNET_DESCRIBE is problematic * Trigger build * CR comments * Move CsrCsrOp and RspRspOp into inline file to be included with more discretion (causing long compiles and possibly out of heap space errors) * lint * fix indents * minor code path optimization * Move some code around to try to get rid of heap error in weak MSVC compiler * reduce complexity for MSVC heap problem * reduce complexity for MSVC heap problem * lint * Remove CUDA portion for non-CUDA builds * remove template parameter * build test * Fix msvc build out of hash space problem * worked on separate build machine, reverting re-add * Fix after merge * revert * change DCHECK_XX to CHECK_XX * remove superfluous checks * signed/unsigned mismatch fix * signed/unsigned mismatch * signed/unsigned mismatch * bypass KernelEx * MSVC OMP * MSVC OMP * lint * lint * turn back on KernelEx * Remove kernel optimization, svae for optimzations story * Fix compile error for caffe plugin * GPU fix, simplify combine mshadow_to_op and op_with_req * revert DCHECK removals * lint * Fix failing perl unit test * Revert "Fix failing perl unit test" This reverts commit ee956c1bddd1d3e5ce12d5faccb22c8d63bd30b4. * Fix numeric_grad for fp64 (lapack tests) * fix conflict * fix strange conflict problem * Don't download every build * lint * Revert "Don't download every build" This reverts commit e24e74b4b2160d23a2c2b2a8124cc2b23715e169. * ,. * Trigger build * CI is being ridiculous * . * Removed sparse namespace for _minimum, _maximum and _hypot * CR comments * Trigger another try at the build * CR comments * Trigger build * Trigger * ... 2017-09-13 12:34:48 -07:00
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`/*! \brief Fill blob with some pattern defined by the getNextData() callback`
			`* Pattern fill in the defined order (important for analysis):`
			`* 1D: batch item -> channel -> depth -> row -> col`
			`* 2D: batch item -> channel -> row -> col`
			`* 3D: batch item -> channel -> col`
			`*/`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename GetNextData>`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`static inline void patternFill(const RunContext& run_ctx,`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`const TBlob* _blob,`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`GetNextData getNextData) {`
			`AccessAsCPU(*_blob, run_ctx, [getNextData](const TBlob& blob) {`
			`const size_t dim = static_cast<size_t>(blob.ndim());`
			`CHECK_LE(dim, 5U) << "Will need to handle above 3 dimensions (another for loop)";`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`const size_t num = blob.size(0);`
			`const size_t channels = dim > 1 ? blob.size(1) : 1;`
			`const size_t depth = dim > 2 ? blob.size(2) : 1;`
			`const size_t height = dim > 3 ? blob.size(3) : 1;`
			`const size_t width = dim > 4 ? blob.size(4) : 1;`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`const size_t numberOfIndexes = blob.shape_.Size();`
			`for (size_t n = 0; n < num; ++n) {`
			`if (dim > 1) {`
			`for (size_t ch = 0; ch < channels; ++ch) {`
			`if (dim > 2) {`
			`for (size_t d = 0; d < depth; ++d) {`
			`if (dim > 3) {`
			`for (size_t row = 0; row < height; ++row) {`
			`if (dim > 4) {`
			`for (size_t col = 0; col < width; ++col) {`
			`if (dim == 5) {`
			`const size_t idx = test::offset(blob.shape_, {n, ch, d, row, col});`
			`CHECK_LT(idx, numberOfIndexes);`
			`MSHADOW_TYPE_SWITCH(blob.type_flag_, ThisDataType, {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`ThisDataType& f = blob.dptr<ThisDataType>()[idx];`
			`f = getNextData();`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`});`
			`} else {`
			`CHECK(dim <= 5) << "Unimplemented dimension: " << dim;`
			`}`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`} else {`
			`const size_t idx = test::offset(blob.shape_, {n, ch, d, row});`
			`CHECK_LT(idx, numberOfIndexes);`
			`MSHADOW_TYPE_SWITCH(blob.type_flag_, ThisDataType, {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`ThisDataType& f = blob.dptr<ThisDataType>()[idx];`
			`f = getNextData();`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`});`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
			`}`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`} else {`
			`const size_t idx = test::offset(blob.shape_, {n, ch, d});`
			`CHECK_LT(idx, numberOfIndexes);`
			`MSHADOW_TYPE_SWITCH(blob.type_flag_, ThisDataType, {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`ThisDataType& f = blob.dptr<ThisDataType>()[idx];`
			`f = getNextData();`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`});`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
			`}`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`} else {`
			`const size_t idx = test::offset(blob.shape_, {n, ch});`
			`CHECK_LT(idx, numberOfIndexes);`
			`MSHADOW_TYPE_SWITCH(blob.type_flag_, ThisDataType, {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`ThisDataType& f = blob.dptr<ThisDataType>()[idx];`
			`f = getNextData();`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`});`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
			`}`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`} else {`
			`const size_t idx = test::offset(blob.shape_, {n});`
			`CHECK_LT(idx, numberOfIndexes);`
			`MSHADOW_TYPE_SWITCH(blob.type_flag_, ThisDataType, {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`ThisDataType& f = blob.dptr<ThisDataType>()[idx];`
			`f = getNextData();`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`});`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`
			`}`
Refactor operators and add MKLDNN (#9677) * Remove MKL code. * Integrate MKLDNN. Update MXNet for MKLDNN. Enable MKLDNN Relu. Fix a compilation error. Change Makefile for MKLDNN. Remove infer storage in convolution. Update MXNet for MKLDNN. Support MKLDNN storage type in python. Update activation. Add MKLDNN base classes. Implement MKLDNN fully connected. Add MKLDNN convolution. Update MKLDNN interface in NDArray. MKLDNN convolution handle CreateMKLDNNData failure. Add another GetMKLDNNData in NDArray. Have mkldnn to define the data format. Create output MKLDNN memory explicitly for FC. Fix a bug in NDArray. Fix a bug in GetWeightDesc. Convert data layout if necessary in FC. remove unnecessary print in MKLDNN convolution. Add MKLDNN deconvolution. Add MKLDNNStream to manage primitives and memories. Use MKLDNNStream to register memory in NDArray. Use MKLDNNStream to manage resources in operators. Handle kAddTo in MKLDNN operators. Fix a bug in deconvolution. Fix bugs in NDArray. Revert "Fix bugs in NDArray." This reverts commit f5624a4aa9f9b9f9fe31f5e6cfa7a9752838fc4e. Fix a bug in NDArray. Fix a bug in NDArray. Reorder MKLDNN memory to default format in SetTBlob. Disable MKLDNN correctly. Fix a bug in activation. Reshape of NDArray supports MKLDNN. Fix a memory ref bug in NDArray. Reshape NDArray in MKLDNN FullyConnected. Fix data format conversion. Create MKLDNN NDArray in python. Support Slice for MKLDNN NDArray. Reduce the overhead of summing the result to the output array. Avoid unnecessary memory copy in NDArray. Fix a bug in data reordering. Fix a bug in NDArray. Don't hard code MKLDNN type. Support dilation in MKLDNN convolution. Fix a bug in sum results. Rewrite GetMKLDNNData. Add prepare_mkldnn.sh Enable MKLDNN activation. Fix a bug on FullyConnected. Handle 3 dims for MKLDNN NDArray. Fix a bug in MKLDNN FC. Support MKLDNN storage in KV store. Fix a bug in executor for non-default NDArray. Fix a link error in cast_storage.cc. Remove unnecessary function def Fall back to def storage if the type isn't supported by MKLDNN. Use NDArray for MKLDNN in python. Reshape output of MKLDNN convolution. Fix a bug in NDArray. Support more operations in MKLDNN NDArray. Fix a bug in deconvolution. Fix bugs in MKLDNN deconvolution. We still need to compute bias correctly. Have elemwise binary ops to fall to default for MKLDNN. Limit the cases that MKLDNN operations are called. Force the layout of mkldnn::memory from NDArray. Add MKLDNN softmax. Fix output storage type of MKLDNN softmax. Add MKLDNN sum. Fix a bug in elemwise sum. Fix a bug in MKLDNN softmax. Fix a bug in imperative. Clean up dispatch modes. Remove redundant code. MKLDNN Pooling Op integration MKLDNN Pooling Op integration add missing file fix mkldnn pooling op workspace issue handle workspace in MKLDNN pooling correctly. Use a non-MKLDNN op for testing. Allow to share arguments and their gradients between executors. Avoid using MKLDNN pooling when it's not supported. Support MKLDNN properly. Choose MKLDNN softmax more carefully. Fix a bug in MKLDNN pooling. Fall back if MKLDNN pooling isn't supported. Fix a bug in Slice of NDArray. Use int32 for workspace memory. Exclude MKLDNN act with tanh. Have two Reshape functions in NDArray. Copy data for NDArray with diff shapes. Add MKLDNN copy. Add MKLDNN version of elemwise_add. Add MKLDNN version of Flatten. add mkldnn surport for concat simplify MKLDNN Flatten. Enalbe MKLDNN deconvolution with bias. Fix a bug in CuDNN deconvolution. avoid using MKLDNNStorage when it's not defined. Remove ./cudnn_lrn-inl.h Fix for make lint. add mkldnn surport for concat fix the coding style for pr of mkldnn concat Only add input data for MKLDNN concat backward Remove unnecessary TODO. remove unnecessary __repr__ in MKLNDArray. better condition check for readability. Use macro when including mkldnn.hpp. Revert "Use CoreOpRunner for refactored Ops." This reverts commit a28586fc25950cc006cb317e26e0d17541ef0586. Fix a bug in test core. Limit MKLDNN ops being used. Fix complains from "make pylint" Move ContainStorage to common/utils.h Limit MKLDNN concat being used. Add license. Fix amalgamation Fix compilation error in mkldnn_ops-inl.h Fix a bug in deconvolution. Fix a bug in pooling. MKLDNN ops allocates temp mem. Fix a bug in pooling. Allocate align memory from temp space. Have parameter gradients stored in the default storage. Handle all cases in CopyFrom. Ensure NDArray returns memory with right memory descriptors. use auto to define memory in the operator. Use raw pointer for mkldnn memory. Move more code to mkldnn_base.cc Fix a compilation error. Address review comments. fix a bug in activation backward. Miss a macro in mkldnn_base.cc Fix a bug in data iterator in examples. Avoid memory allocation in ReshapeMKLDNN. Avoid memory allocation in storage cast. Fix a bug in cast storage. Handle sliced MKLDNN NDArray. Use memcpy if NDArray uses default format. Revert "Limit MKLDNN ops being used." This reverts commit 75e2ae570d03483868ec4ed8ed46015c7fa6c6fb. Enable mkldnn act backward has the same input layout. Fix a bug in mkldnn activation. Use MKLDNN sum in more cases. Improve perf of reorder. Avoid memory reorder in conv and deconv. Avoid unnecessary storage cast in fallback path. Revert "Use MKLDNN sum in more cases." This reverts commit 7a21ebca8bbe17fde49c3b1ca3f31b835a33afb8. Handle sliced ndarray in more cases. Fix a complain from make lint. Update Jenkins to test MKLDNN. debug compiling mkldnn. Use MKLDNN sum in more cases. Add mkldnn as a submodule. Compile with mkldnn in 3rdparty. Fix some coding styles. write the path to mkldnn lib in libmxnet.so. use rpath with $ORIGIN. Pack all lib files in Jenkins. pack and unpack mxnet with MKLDNN. Update Jenkinsfile Update Jenkinsfile Add mkldnn batch normalization Fix bugs in BN. Avoid memory allocation in MKLDNNCopy. only use MKLDNN BatchNorm for special cases. MKLDNN BatchNorm doesn't work well on the default layout. Add MKL-DNN based LRN Code Style Changes Fix a bug in BN. Fix a bug in LRN. Handle non-default storage in memory plan. Fix coding style. Fix a compilation error without mkldnn. Fix some coding styles for batch norm Improve forward of convolution. Add openmp and simd support to BN operator Retrieve MKLDNN Conv primitive based on signature. Retrieve Act primitive based on its signature. Fix a bug in pooling. Diable some MKLDNN activation and pooling. Cast MKLDNN storage with diff data type. Check if it's a view of NDArray. Reshaped and sliced arrays share the same chunks. Implement caching MKLDNN Act correctly. Fix a bug in check_consistency. Fix a potential bug when destroying NDArray. Fix bugs when allocating mem in NDArray. Fix coding style. Add micro when using mkldnn in ndarray. Fix a compilation error. Fix a bug in concat. Remove MKLDNNStorage. handle diff layouts in CopyFromToDnsImpl. Fallback correctly. Force weight grad to use default layout. Reorder weight arrays in (de)conv for faster inference. Avoid caching TBlob from NDArray. This commit may add some overhead of managing NDArray for each fallback. Fix a bug in Flatten. handle ndarray with def layout in mkldnn BN correctly. Align to page when mkldnn is enabled. Use default mem alloc for mkldnn. Reuse NDArrays. Support WriteInplace for sum. fix complains from "make lint". Avoid reallocation in NDArray. Handle weight arrays with special MKLDNN layouts. Remove unnecessary GetWeights. Fix compilation error without MKLDNN. Fix a bug in (de)conv for weight arrays. Fix a minor bug in MKLDNN conv. Fix a bug in MKLDNNOpSignature. Reimplement fallback for MKLDNN ops. Fix a bug in FallbackExecutor. Add params in hashcode. Invalidate data in outputs to accelerate. Fix a minor bug. Update mkldnn_base-inl.h Add primitive caching for Pooling forward computation Add hashcode in pooling parameters. Support NDArray copy with types unsupported by MKLDNN. Avoid using MKLDNN concat for negative dimension. Fix make lint complain. Disable mkldnn avg pooling for now. Fix a compile warning. Fix compile error when MKLDNN is disabled. OP primitive cache: use memory as signature for MKLDNN storage type Remove MKLDNN array in python. Disable Clang tests in Jenkins. Use mklml dockers to test mkldnn. Update MKLDNN repo to zhengda's mkldnn repo. Update MKLDNN repo to ashok's. Fix a bug in fallback. Change avg pooling algorithm to pooling_avg_include_padding Fix a code style in mkldnn pooling. Temp fix a bug in FC. Revert "Disable Clang tests in Jenkins." This reverts commit b4efa8f89592d30a27f9c30e2237e9420ac6749a. Rebase and Refactor deconv (#20) * rebase to Da,Zheng refactor branch Jan.14, add signature for mkldnn Deconv and modify classMKLDNNDeconvForward * fix make lint complains A simple way of caching BN inference. cache BN forward for both training and inference. Fix some minor problems in BN. Fix a bug in caching BN. force to build with avx2 in Jenkins. Remove the remaining MKLDNNStorageType Some minor updates in NDArray. a lot of updates to address comments. minor changes. * Use NNVM interface. Use NNVM interface for upsampling. Use NNVM interface for convolution. Use NNVM interface for deconvolution. Use NNVM interface for FullyConnected. Move NNVM interface to batch norm. Use NNVM interface for depthwise convolution. Use NNVM interface for softmax activation. Use NNVM interface for pooling. use NNVM interface for dropout. Use NNVM interface for activation. Use NNVM interface for CuDNN batch norm. Use NNVM interface for CuDNN pooling. Use NNVM interface for CuDNN softmax activation. Use NNVM interface for CuDNN activation. Use NNVM interface for CuDNN convolution. Use NNVM interface for CuDNN deconvolution. Move concat to nn/ Use NNVM interface for concat. Fix headers in concat. Move lrn to nn/. Use NNVM interface for LRN. Fix a compilation error in convolution. Fix a compilation error in activation. Fix coding style. Fix coding style for make lint. use enums in batch norm. Use CoreOpRunner for refactored Ops. Make FullyConnected stateless. Make upsampling stateless. Make pooling stateless. Make batchnorm stateless. Make SoftmaxActivation stateless. Fix a code style problem. pass amalgamation test for batch norm. pass amalgamation test for dropout. Get convolution ops from a function. Fix compilation errors for GPU. Fix thread local in diff platforms. Avoid using thread_local for non-CuDNN conv/deconv. Remove TODO in deconv. Fix a bug in batch norm. Fix a bug in fully connected. Don't set #inputs for backward convolution. Revert "Make pooling stateless." * revert modification in test_executor. * Fix a bug in FlattenStorageType. * Remove BN debug. * Remove remaining MXNET_USE_MKL2017 * Remove unused code in pooling. * Fixing bugs in gtests. * Fix lint errors. * a lot of minor updates to address comments. * Fix coding style in MKLDNN Pooling (#22) * revert the code change in the previous code refactor. * Fix a bug in pooling. * LRN coding style changes (#21) * LRN coding style change * Add const for local variables * Add req for LRN forward * rebase code * align API interface * revert modification in test_executor. * cast storage with MKLDNN properly. * Minor updates to address comments. * some minor updates. * Switch to the master branch of MKLDNN. * Minor updates to address comments. * Update activation.cc * Fix a bug in convert NDArray. * Add gluon model zoo tests. * Update GPU tests on model zoo. * Avoid using mobilenet for GPU tests with gluon models. mobilenet can't pass the test even without MKLDNN. * Update GPU tests on gluon. * change cmake to compile MKLDNN. * update cmake for MKLDNN. * Implement align myself. * Switch to intel/mkl-dnn. * Fix errors in align unittest. * Add unit test for LRN. * fix a compilation error. * use storage_type_assign to determine storage type. * avoid global pooling in mkldnn. There is a bug in global pooling in mkldnn. * compare all MKLDNN ops with native impls. add MXNET_MKLDNN_DEBUG to control the test. * Fix a bug in testing correctness. * print the name of buggy operator. * undo some modifications. * Fix a bug on reshaped array. * avoid testing outputs with NullOp. * turn on MKLDNN tests in Jenkins. * print each operator in MKLDNN tests. * rename test_gluon_model_zoo.py * Create hashcode for operator parameters properly. * Add USE_MKL2017 back. * Print warning messages. * move batchnorm tests to nnvm interface. * Delete batchnorm v1 tests. * Get inputs and outputs in batchnorm tests. * disable batchnorm tests for now. * Fix GPU tests on gluon model zoo. * Fix lint complains in tests. * Remove simd from openmp instructions in BatchNorm (#24) * Remove warnings. * Fix MKLDNN 1st compile failure issue (#23) * Fix compilation errors. * Remove ARCH_OPT in Jenkins. * Revert "avoid global pooling in mkldnn." This reverts commit f6efd342e64968cb848c9193d80e929968b8052c. * Move to the latest MKLDNN. This fixes the bug in global pooling. * WIP unit tests (#25) * WIP unit tests * some backward items initialized * Make more C++ unit tests work for batch norm (#28) * WIP unit tests * some backward items initialized * some backward items initialized * some backward items initialized * first unit test working * Working on types * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * backward types working for fp16 on first unit test * . * . * some tests working * fix input data * hangle gpu<->cpu for setting values * gpu working * gpu working * CAccessAsCPU class * Fix varying type in AccessAsCPU * starting to add channel axis tests * TestChannelAxisSimple * TestChannelAxisSimple * run bidirectional * run bidirectional * run bidirectional * CLEANUP * CLEANUP * .. * noaxis * .. * lint * revert * revert * Fix lint complains. * Fix a minor problem in Makefile. * fix GPU pooling. * Disable modelzoo inference tests. * update accuracy checks for MKLDNN. * Fix MKLDNN pooling for global pooling. * Fix Jenkins. * Fix a bug in Jenkins. * Fix Jenkins 2018-02-15 14:44:34 -08:00			`});`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`}`

			`/! \brief Return a random number within a given range (inclusive) /`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <class ScalarType>`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`inline ScalarType rangedRand(const ScalarType min, const ScalarType max) {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`uint64_t num_bins = static_cast<uint64_t>(max + 1), num_rand = static_cast<uint64_t>(RAND_MAX),`
			`bin_size = num_rand / num_bins, defect = num_rand % num_bins;`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`ScalarType x;`
			`do {`
Add googletest as a 3rdparty library (#9016) * [CMake] Compile with gtest * [Make] Use gtest from 3rdparty in Make build * [Clang] Fix warning * [Windows] Misc test fixes * [rebase] update mshadow... * Add googletest as submodule * googletest -> release-1.8.0 2017-12-14 17:25:26 +01:00			`x = std::rand();`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`} while (num_rand - defect <= (uint64_t)x);`

			`return static_cast<ScalarType>(x / bin_size + min);`
			`}`

Pull more optimize and simplification changes from tuner branch (#8599) * Pull more optimize changes from tuner branch * remove newline * Move file * Added slice_channel_perf.cc 2017-11-11 12:05:57 -08:00			`/*!`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`* \brief Deterministically compare mxnet::TShape objects as less-than,`
Pull more optimize and simplification changes from tuner branch (#8599) * Pull more optimize changes from tuner branch * remove newline * Move file * Added slice_channel_perf.cc 2017-11-11 12:05:57 -08:00			`* for use in stl sorted key such as map and set`
			`* \param s1 First shape`
			`* \param s2 Second shape`
			`* \return true if s1 is less than s2`
			`*/`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`inline bool operator<(const mxnet::TShape& s1, const mxnet::TShape& s2) {`
Pull more optimize and simplification changes from tuner branch (#8599) * Pull more optimize changes from tuner branch * remove newline * Move file * Added slice_channel_perf.cc 2017-11-11 12:05:57 -08:00			`if (s1.Size() == s2.Size()) {`
			`if (s1.ndim() == s2.ndim()) {`
			`for (size_t i = 0, n = s1.ndim(); i < n; ++i) {`
			`if (s1[i] == s2[i]) {`
			`continue;`
			`}`
			`return s1[i] < s2[i];`
			`}`
			`return false;`
			`}`
			`return s1.ndim() < s2.ndim();`
			`}`
			`return s1.Size() < s2.Size();`
			`}`

			`/*!`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`* \brief Deterministically compare a vector of mxnet::TShape objects as less-than,`
Pull more optimize and simplification changes from tuner branch (#8599) * Pull more optimize changes from tuner branch * remove newline * Move file * Added slice_channel_perf.cc 2017-11-11 12:05:57 -08:00			`* for use in stl sorted key such as map and set`
			`* \param v1 First vector of shapes`
			`* \param v2 Second vector of shapes`
			`* \return true if v1 is less than v2`
			`*/`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`inline bool operator<(const std::vector<mxnet::TShape>& v1, const std::vector<mxnet::TShape>& v2) {`
Pull more optimize and simplification changes from tuner branch (#8599) * Pull more optimize changes from tuner branch * remove newline * Move file * Added slice_channel_perf.cc 2017-11-11 12:05:57 -08:00			`if (v1.size() == v2.size()) {`
			`for (size_t i = 0, n = v1.size(); i < n; ++i) {`
			`if (v1[i] == v2[i]) {`
			`continue;`
			`}`
			`return v1[i] < v2[i];`
			`}`
			`return false;`
			`}`
			`return v1.size() < v2.size();`
			`}`

			`/*!`
			`* \brief std::less compare structure for compating vectors of shapes for stl sorted containers`
			`*/`
			`struct less_shapevect {`
[MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (#14270) * Bring nnvm::Tuple to mxnet::Tuple * Retrigger CI * Fix issues casued by rebase * Address comments from Jun * Trigger CI * Address comments from Da * Retrigger due to flakiness * Retrigger CI 2019-02-28 17:41:39 -08:00			`bool operator()(const std::vector<mxnet::TShape>& v1,`
			`const std::vector<mxnet::TShape>& v2) const {`
Pull more optimize and simplification changes from tuner branch (#8599) * Pull more optimize changes from tuner branch * remove newline * Move file * Added slice_channel_perf.cc 2017-11-11 12:05:57 -08:00			`if (v1.size() == v2.size()) {`
			`for (size_t i = 0, n = v1.size(); i < n; ++i) {`
			`if (v1[i] == v2[i]) {`
			`continue;`
			`}`
			`return v1[i] < v2[i];`
			`}`
			`return false;`
			`}`
			`return v1.size() < v2.size();`
			`}`
			`};`

Engine reserves cores from OMP. Set some defaults for dynamic and recursion (#8553) * Engine reserves cores from OMP. Set some defaults for dynamic and recursion unless environment variables are set. * Pull some generic unit testing stuff from tuner branch Also, something with gperftools got missed in a CMakeLists.txt merge at some point. * lint 2017-11-06 21:43:38 -08:00			`inline std::string pretty_num(uint64_t val) {`
Kernel operator tuning (#8686) * Refreshed branch bc_tune * local-build openmp as static * trigger * Somehow broadcast found its way back in, removed again * Trigger rebuild 2017-11-21 06:49:51 -08:00			`if (!test::csv) {`
			`std::string res, s = std::to_string(val);`
			`size_t ctr = 0;`
			`for (int i = static_cast<int>(s.size()) - 1; i >= 0; --i, ++ctr) {`
			`if (ctr && (ctr % 3) == 0) {`
			`res += ",";`
			`}`
			`res.push_back(s[i]);`
Engine reserves cores from OMP. Set some defaults for dynamic and recursion (#8553) * Engine reserves cores from OMP. Set some defaults for dynamic and recursion unless environment variables are set. * Pull some generic unit testing stuff from tuner branch Also, something with gperftools got missed in a CMakeLists.txt merge at some point. * lint 2017-11-06 21:43:38 -08:00			`}`
Kernel operator tuning (#8686) * Refreshed branch bc_tune * local-build openmp as static * trigger * Somehow broadcast found its way back in, removed again * Trigger rebuild 2017-11-21 06:49:51 -08:00			`std::reverse(res.begin(), res.end());`
			`return res;`
			`} else {`
			`return std::to_string(val);`
Engine reserves cores from OMP. Set some defaults for dynamic and recursion (#8553) * Engine reserves cores from OMP. Set some defaults for dynamic and recursion unless environment variables are set. * Pull some generic unit testing stuff from tuner branch Also, something with gperftools got missed in a CMakeLists.txt merge at some point. * lint 2017-11-06 21:43:38 -08:00			`}`
			`}`

Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`/! \brief Change a value during the scope of this declaration /`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`template <typename T>`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`struct ScopeSet {`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`inline ScopeSet(T* var, const T tempValue) : var_(*var), saveValue_(var) {`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`*var = tempValue;`
			`}`
			`inline ~ScopeSet() {`
			`var_ = saveValue_;`
			`}`
			`T& var_;`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`T saveValue_;`
Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`};`

Get rid of warnings (#21099) * Get rid of warnings * Fix clang formatting * Fix problem with redefinition * Fix 'uninitialized' warning * Update CONTRIBUTORS.md 2022-07-19 17:00:19 +02:00			`static inline void AssertEqual(const std::vector<NDArray*>& in_arrs,`
			`const std::vector<NDArray*>& out_arrs,`
			`float rtol = 1e-5,`
			`float atol = 1e-8,`
			`bool test_first_only = false) {`
Multithreaded Inference Support (#16654) * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Add CI changes * Add stage Fix indentation * Fix lint * Change to DEFAULT for C API * Fix mxnet_unit_tests path * export correct LD_LIBRARY_PATH * Add cpp include dirs * Build test with USE_CPP_PACKAGE * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Merge * change mkldnn lib name * Add static_alloc, static_Shape support * Address review comments * Make GetCachedOpThreadSafeState similar to cached_op * Address review comments: comments for locking strategy * multithreaded inference tutorial * [Estimator] handle composite metrics in estimator (#16676) * handle composite metrics in estimator * fix composite metric case in handlers * remove unused import * [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678) * refactor estimator to allow overriding evaluate/fit of a batch * add doc to explain call structure and how to override * fix and doc * Pointwise fusion for GPU (#15167) * Beginning of RTC of pointwise ops * Code generation from the given JSON * add initial simple_partition_pass and use it for pointwise fusion * fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code * Fixes * Adding support for attribute inference for backward nodes when fusing * keep proper input ordering for fused Op * instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph * Fuse backward * fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch * excluse forward node fusion during the fusion of the nodes in the backward graph * Dealing with fused backward nodes inferattr * use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph * Adding support for other reqs in codegen * Fix * Cleaning * Change the TVM submodule * More cleaning * Making linter happy * Do fusion only if default context is GPU * Fixes for tests Add powerscalar and rpowerscalar, fix return type of zero and one Cleaning, fixing lint Go back to proper TVM submodule * Fix the TVM commit * Fix lint * Guard fusion with MXNET_USE_CUDA * Fix * Fix clang-tidy * Add erf and erfinv backward * Gluon support for fusion * Cleaning * Cleaning and allow shape/type change in FusedOp * Fixing Gluon bugs * Fixing after rebase * Fixing race condition and guarding against races when using NVRTC * Cleaning and renaming FusedOp to _FusedOp * Going easy on Windows compiler * Disable fusion on Windows for now * Refactor InferAttr and InferShapeAttr * Added slice and half2 support to FusedOp * Fix lint errors * Added multiple types support for vector loading/storing * add slice fusion when it's at the beginning of subgraphs * Removed constant ndim assumption in fused op * Fix memory alignment issue in slice for FusedOp * Fixes * Fix lint errors * Do not include cuda_fp16.h * Refactor fused op op lists * Make linter happy * Changes from review * Fixes after rebase * Expand FusedOp support for slice * Fix for fp16 _zeros and _ones * Fix * Moving aux functions to unnamed namespace and detail namespace -> fusion namespace * Disabling fusion if it alters topological order of inputs * Print code only when env variable is set * Fix * Fix lint and 2 tests that specify the same names for multiple inputs * Fixes from review and disabling fusion of slice with non-default step * Add amp_cast to fusion, fixes * Add amp_multicast and its backward to the list of support ops * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Make clearer comment * Adding punctuation and capitalization to \brief descriptions * Fix * Fix * Add backward_cast to fusion * Adding unittests for fusion. Fix for erfinv_grad * Adding slice ops and add_n to tests * Fixes from review * Setting inplace option * Fix lint * Storing double in half * Retrigger CI * Slight relaxing of the relative tolerance in the test * Move the env variable check to the end * Fix a race condition between InferShape and scheduled Forward * Fix flakey test_fusion test involving fp32 erfinv op. * Fix from review * Added broadcast_like and slice_like to fused op * Minor fix and cleanup * Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like * Added axes support to slice_like * Added axis support to broadcast_like * Add fast_load_slice function to fused op code * Added runtime switch for choosing fast and slow slice kernel * Fix lint and warning * Going easy on Windows compiler (again) * Fix slice_like * Debug broadcast_like fusion * Fix lint * Fix lint * Trigger CI * Get rid of the initializer list * Fix backward calls with different gradient type * avoid cycle when adding node specific for inputs of subgraph for pointwise fusion * Fix lint * Add namespace to the fusion implementations * Set launch bounds on the fused kernel * Fix NumPy tests * Test showcasing an issue fixed in PR #16553 * Cast scalarts to FP32 and perform (a1.0/b) instead of (a/b) Fix lint errors Fix lint Fix a bug in cycle detection for inputs only op in pointwise fusion * Add comments to simple_partition_pass.h file * fix install dir (#16690) * [numpy] add numpy operator : append (#16564) * add operator : append ; fix op concatenate when axis = None * pylint disable remove mistake disable pylint * Initializer.__eq__ (#16680) * fix binary dependencies in CD and nightly (#16693) * [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688) * add mxnet mkldnn cmake instruction * imporve doc * OMP->OpenMP * Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697) This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa. * [Estimator] refactor estimator and clarify docs (#16694) * refactor estimator and clarify docs * fix info message and test * clean up after releasing logging handler * Eliminate common expressions (#15657) * Eliminate common expressions from a graph * Guarding against optimizing out stateful ops and ops that require resource * Fix lint * Added THasDeterministicOutput to multiple ops * DDebug eliminate common expr * Added test * Expose get_optimized_symbol * Fix * Fix 2 * Add doc to the Python call * Add env var MXNET_ELIMINATE_COMMON_EXPR, default true * Add comments, improve readability of eliminate_common_expr_pass.cc * Expand testing * Lower priority of THasDeterministicOutput attr for equal Node test * Change mx.gpu() to mx.cpu() in tests * Skip CSE test on Windows (as env variable setting during test does not work there) * Add missing import sys * Add missing import logging * Backport of #16711, #16737, #16408 to 1.6 branch (#16763) * support mixed-precision true_divide (#16711) * [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737) * use dim_t instead of int * fix same issue in pooling * rebase code * trigger CI * Add MXNet Ops for fast multihead attention (#16408) * add MXNet Ops for fast multihead attention * add cutlass as 3rdparty dependency * add cutlass to compilation flags * remove all cutlass stuff * add better error message and description and remove cutlass from compilation flags * change credit for the approach since the code have changed * fix typos * correct another typo * Add all the cuda/cublas helper functions * remove tests using kAddTo * only use cublasStridedBatchedGemm if CUDA >= 9.1 * add equivalent mxnet code in description of mha ops * remove a wrong copy-paste * add _contrib for namespace and add GPU only on description * add warning in bwd_ignore_zero_init description, also test with fp32 * add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO * remove std::move for clang * remove bwd_ignore_zero_init flag * remove bwd_ignore_zero_init in test_operator_gpu.py * fix typo * fix another typo * Removed unrelated test * Add example and documentation for multi threaded inference * Add LICENSE * Add get_model.py * Add license for README * Refactor cached op and cached op threadsafe * Add limitation * Add tests for naive engine * Add latest test changes * Thread Safety tests in NaiveEngine mode * Thread Safety tests update * Update thread safety tests, add unsupported use cases * Changes to doc and refactor * Fix todo owner, indentation and mx_float->float * Refactor cached op code, remove num_threads arg from example * Fix lint * Fix warning * Add back cython, required for unix-gpu build * Fix for windows * Add bulking support for thread safe cached op version * Add support for subgraph testing * import mxnet before calling get_backend_symbol * Fix symbol json name * Refactor DynamicForward * Add comments * Add DMLC_ATTRIBUTE_UNUSED * Fix use_naive_run issue * Fix lint * Revert unittest_cpp to old test since it doesnt test thread safety * Fix doc Co-authored-by: Sheng Zha <szha@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com> Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Zhennan Qin <zhennan.qin@intel.com> 2020-02-01 09:36:59 -08:00			`for (size_t j = 0; j < in_arrs.size(); ++j) {`
			`// When test_all is fir`
			`if (test_first_only && j == 1) {`
			`return;`
			`}`
			`NDArray tmp1 = *in_arrs[j];`
			`NDArray tmp2 = *out_arrs[j];`
			`if (tmp1.ctx().dev_type == mxnet::Context::kGPU) {`
			`tmp1 = tmp1.Copy(mxnet::Context::CPU(0));`
			`tmp2 = tmp2.Copy(mxnet::Context::CPU(0));`
			`tmp1.WaitToRead();`
			`tmp2.WaitToRead();`
			`}`
Change inner mxnet flags nomenclature for oneDNN library (#19944) This change includes: * changing MXNET_USE_MKLDNN flag name to MXNET_USE_ONEDNN * changing USE_MKLDNN flag name to USE_ONEDNN * changing 3rdparty/mkldnn folder name to 3rdparty/onednn * changing include/mkldnn folder name to include/onednn * changing MKLDNN occurences in build and documentation files to ONEDNN * adding Bartosz Kuncer to contributors list 2021-03-15 17:32:37 +01:00			`#if MXNET_USE_ONEDNN == 1`
Multithreaded Inference Support (#16654) * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Add CI changes * Add stage Fix indentation * Fix lint * Change to DEFAULT for C API * Fix mxnet_unit_tests path * export correct LD_LIBRARY_PATH * Add cpp include dirs * Build test with USE_CPP_PACKAGE * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Merge * change mkldnn lib name * Add static_alloc, static_Shape support * Address review comments * Make GetCachedOpThreadSafeState similar to cached_op * Address review comments: comments for locking strategy * multithreaded inference tutorial * [Estimator] handle composite metrics in estimator (#16676) * handle composite metrics in estimator * fix composite metric case in handlers * remove unused import * [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678) * refactor estimator to allow overriding evaluate/fit of a batch * add doc to explain call structure and how to override * fix and doc * Pointwise fusion for GPU (#15167) * Beginning of RTC of pointwise ops * Code generation from the given JSON * add initial simple_partition_pass and use it for pointwise fusion * fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code * Fixes * Adding support for attribute inference for backward nodes when fusing * keep proper input ordering for fused Op * instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph * Fuse backward * fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch * excluse forward node fusion during the fusion of the nodes in the backward graph * Dealing with fused backward nodes inferattr * use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph * Adding support for other reqs in codegen * Fix * Cleaning * Change the TVM submodule * More cleaning * Making linter happy * Do fusion only if default context is GPU * Fixes for tests Add powerscalar and rpowerscalar, fix return type of zero and one Cleaning, fixing lint Go back to proper TVM submodule * Fix the TVM commit * Fix lint * Guard fusion with MXNET_USE_CUDA * Fix * Fix clang-tidy * Add erf and erfinv backward * Gluon support for fusion * Cleaning * Cleaning and allow shape/type change in FusedOp * Fixing Gluon bugs * Fixing after rebase * Fixing race condition and guarding against races when using NVRTC * Cleaning and renaming FusedOp to _FusedOp * Going easy on Windows compiler * Disable fusion on Windows for now * Refactor InferAttr and InferShapeAttr * Added slice and half2 support to FusedOp * Fix lint errors * Added multiple types support for vector loading/storing * add slice fusion when it's at the beginning of subgraphs * Removed constant ndim assumption in fused op * Fix memory alignment issue in slice for FusedOp * Fixes * Fix lint errors * Do not include cuda_fp16.h * Refactor fused op op lists * Make linter happy * Changes from review * Fixes after rebase * Expand FusedOp support for slice * Fix for fp16 _zeros and _ones * Fix * Moving aux functions to unnamed namespace and detail namespace -> fusion namespace * Disabling fusion if it alters topological order of inputs * Print code only when env variable is set * Fix * Fix lint and 2 tests that specify the same names for multiple inputs * Fixes from review and disabling fusion of slice with non-default step * Add amp_cast to fusion, fixes * Add amp_multicast and its backward to the list of support ops * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Make clearer comment * Adding punctuation and capitalization to \brief descriptions * Fix * Fix * Add backward_cast to fusion * Adding unittests for fusion. Fix for erfinv_grad * Adding slice ops and add_n to tests * Fixes from review * Setting inplace option * Fix lint * Storing double in half * Retrigger CI * Slight relaxing of the relative tolerance in the test * Move the env variable check to the end * Fix a race condition between InferShape and scheduled Forward * Fix flakey test_fusion test involving fp32 erfinv op. * Fix from review * Added broadcast_like and slice_like to fused op * Minor fix and cleanup * Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like * Added axes support to slice_like * Added axis support to broadcast_like * Add fast_load_slice function to fused op code * Added runtime switch for choosing fast and slow slice kernel * Fix lint and warning * Going easy on Windows compiler (again) * Fix slice_like * Debug broadcast_like fusion * Fix lint * Fix lint * Trigger CI * Get rid of the initializer list * Fix backward calls with different gradient type * avoid cycle when adding node specific for inputs of subgraph for pointwise fusion * Fix lint * Add namespace to the fusion implementations * Set launch bounds on the fused kernel * Fix NumPy tests * Test showcasing an issue fixed in PR #16553 * Cast scalarts to FP32 and perform (a1.0/b) instead of (a/b) Fix lint errors Fix lint Fix a bug in cycle detection for inputs only op in pointwise fusion * Add comments to simple_partition_pass.h file * fix install dir (#16690) * [numpy] add numpy operator : append (#16564) * add operator : append ; fix op concatenate when axis = None * pylint disable remove mistake disable pylint * Initializer.__eq__ (#16680) * fix binary dependencies in CD and nightly (#16693) * [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688) * add mxnet mkldnn cmake instruction * imporve doc * OMP->OpenMP * Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697) This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa. * [Estimator] refactor estimator and clarify docs (#16694) * refactor estimator and clarify docs * fix info message and test * clean up after releasing logging handler * Eliminate common expressions (#15657) * Eliminate common expressions from a graph * Guarding against optimizing out stateful ops and ops that require resource * Fix lint * Added THasDeterministicOutput to multiple ops * DDebug eliminate common expr * Added test * Expose get_optimized_symbol * Fix * Fix 2 * Add doc to the Python call * Add env var MXNET_ELIMINATE_COMMON_EXPR, default true * Add comments, improve readability of eliminate_common_expr_pass.cc * Expand testing * Lower priority of THasDeterministicOutput attr for equal Node test * Change mx.gpu() to mx.cpu() in tests * Skip CSE test on Windows (as env variable setting during test does not work there) * Add missing import sys * Add missing import logging * Backport of #16711, #16737, #16408 to 1.6 branch (#16763) * support mixed-precision true_divide (#16711) * [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737) * use dim_t instead of int * fix same issue in pooling * rebase code * trigger CI * Add MXNet Ops for fast multihead attention (#16408) * add MXNet Ops for fast multihead attention * add cutlass as 3rdparty dependency * add cutlass to compilation flags * remove all cutlass stuff * add better error message and description and remove cutlass from compilation flags * change credit for the approach since the code have changed * fix typos * correct another typo * Add all the cuda/cublas helper functions * remove tests using kAddTo * only use cublasStridedBatchedGemm if CUDA >= 9.1 * add equivalent mxnet code in description of mha ops * remove a wrong copy-paste * add _contrib for namespace and add GPU only on description * add warning in bwd_ignore_zero_init description, also test with fp32 * add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO * remove std::move for clang * remove bwd_ignore_zero_init flag * remove bwd_ignore_zero_init in test_operator_gpu.py * fix typo * fix another typo * Removed unrelated test * Add example and documentation for multi threaded inference * Add LICENSE * Add get_model.py * Add license for README * Refactor cached op and cached op threadsafe * Add limitation * Add tests for naive engine * Add latest test changes * Thread Safety tests in NaiveEngine mode * Thread Safety tests update * Update thread safety tests, add unsupported use cases * Changes to doc and refactor * Fix todo owner, indentation and mx_float->float * Refactor cached op code, remove num_threads arg from example * Fix lint * Fix warning * Add back cython, required for unix-gpu build * Fix for windows * Add bulking support for thread safe cached op version * Add support for subgraph testing * import mxnet before calling get_backend_symbol * Fix symbol json name * Refactor DynamicForward * Add comments * Add DMLC_ATTRIBUTE_UNUSED * Fix use_naive_run issue * Fix lint * Revert unittest_cpp to old test since it doesnt test thread safety * Fix doc Co-authored-by: Sheng Zha <szha@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com> Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Zhennan Qin <zhennan.qin@intel.com> 2020-02-01 09:36:59 -08:00			`tmp1 = tmp1.Reorder2Default();`
			`tmp2 = tmp2.Reorder2Default();`
			`#endif`
			`EXPECT_EQ(tmp1.shape().Size(), tmp2.shape().Size());`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`TBlob blob1 = tmp1.data();`
			`TBlob blob2 = tmp2.data();`
			`mshadow::default_real_t* d1 = static_cast<mshadow::default_real_t*>(blob1.dptr_);`
			`mshadow::default_real_t* d2 = static_cast<mshadow::default_real_t*>(blob2.dptr_);`
Multithreaded Inference Support (#16654) * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Add CI changes * Add stage Fix indentation * Fix lint * Change to DEFAULT for C API * Fix mxnet_unit_tests path * export correct LD_LIBRARY_PATH * Add cpp include dirs * Build test with USE_CPP_PACKAGE * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Merge * change mkldnn lib name * Add static_alloc, static_Shape support * Address review comments * Make GetCachedOpThreadSafeState similar to cached_op * Address review comments: comments for locking strategy * multithreaded inference tutorial * [Estimator] handle composite metrics in estimator (#16676) * handle composite metrics in estimator * fix composite metric case in handlers * remove unused import * [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678) * refactor estimator to allow overriding evaluate/fit of a batch * add doc to explain call structure and how to override * fix and doc * Pointwise fusion for GPU (#15167) * Beginning of RTC of pointwise ops * Code generation from the given JSON * add initial simple_partition_pass and use it for pointwise fusion * fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code * Fixes * Adding support for attribute inference for backward nodes when fusing * keep proper input ordering for fused Op * instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph * Fuse backward * fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch * excluse forward node fusion during the fusion of the nodes in the backward graph * Dealing with fused backward nodes inferattr * use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph * Adding support for other reqs in codegen * Fix * Cleaning * Change the TVM submodule * More cleaning * Making linter happy * Do fusion only if default context is GPU * Fixes for tests Add powerscalar and rpowerscalar, fix return type of zero and one Cleaning, fixing lint Go back to proper TVM submodule * Fix the TVM commit * Fix lint * Guard fusion with MXNET_USE_CUDA * Fix * Fix clang-tidy * Add erf and erfinv backward * Gluon support for fusion * Cleaning * Cleaning and allow shape/type change in FusedOp * Fixing Gluon bugs * Fixing after rebase * Fixing race condition and guarding against races when using NVRTC * Cleaning and renaming FusedOp to _FusedOp * Going easy on Windows compiler * Disable fusion on Windows for now * Refactor InferAttr and InferShapeAttr * Added slice and half2 support to FusedOp * Fix lint errors * Added multiple types support for vector loading/storing * add slice fusion when it's at the beginning of subgraphs * Removed constant ndim assumption in fused op * Fix memory alignment issue in slice for FusedOp * Fixes * Fix lint errors * Do not include cuda_fp16.h * Refactor fused op op lists * Make linter happy * Changes from review * Fixes after rebase * Expand FusedOp support for slice * Fix for fp16 _zeros and _ones * Fix * Moving aux functions to unnamed namespace and detail namespace -> fusion namespace * Disabling fusion if it alters topological order of inputs * Print code only when env variable is set * Fix * Fix lint and 2 tests that specify the same names for multiple inputs * Fixes from review and disabling fusion of slice with non-default step * Add amp_cast to fusion, fixes * Add amp_multicast and its backward to the list of support ops * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Make clearer comment * Adding punctuation and capitalization to \brief descriptions * Fix * Fix * Add backward_cast to fusion * Adding unittests for fusion. Fix for erfinv_grad * Adding slice ops and add_n to tests * Fixes from review * Setting inplace option * Fix lint * Storing double in half * Retrigger CI * Slight relaxing of the relative tolerance in the test * Move the env variable check to the end * Fix a race condition between InferShape and scheduled Forward * Fix flakey test_fusion test involving fp32 erfinv op. * Fix from review * Added broadcast_like and slice_like to fused op * Minor fix and cleanup * Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like * Added axes support to slice_like * Added axis support to broadcast_like * Add fast_load_slice function to fused op code * Added runtime switch for choosing fast and slow slice kernel * Fix lint and warning * Going easy on Windows compiler (again) * Fix slice_like * Debug broadcast_like fusion * Fix lint * Fix lint * Trigger CI * Get rid of the initializer list * Fix backward calls with different gradient type * avoid cycle when adding node specific for inputs of subgraph for pointwise fusion * Fix lint * Add namespace to the fusion implementations * Set launch bounds on the fused kernel * Fix NumPy tests * Test showcasing an issue fixed in PR #16553 * Cast scalarts to FP32 and perform (a1.0/b) instead of (a/b) Fix lint errors Fix lint Fix a bug in cycle detection for inputs only op in pointwise fusion * Add comments to simple_partition_pass.h file * fix install dir (#16690) * [numpy] add numpy operator : append (#16564) * add operator : append ; fix op concatenate when axis = None * pylint disable remove mistake disable pylint * Initializer.__eq__ (#16680) * fix binary dependencies in CD and nightly (#16693) * [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688) * add mxnet mkldnn cmake instruction * imporve doc * OMP->OpenMP * Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697) This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa. * [Estimator] refactor estimator and clarify docs (#16694) * refactor estimator and clarify docs * fix info message and test * clean up after releasing logging handler * Eliminate common expressions (#15657) * Eliminate common expressions from a graph * Guarding against optimizing out stateful ops and ops that require resource * Fix lint * Added THasDeterministicOutput to multiple ops * DDebug eliminate common expr * Added test * Expose get_optimized_symbol * Fix * Fix 2 * Add doc to the Python call * Add env var MXNET_ELIMINATE_COMMON_EXPR, default true * Add comments, improve readability of eliminate_common_expr_pass.cc * Expand testing * Lower priority of THasDeterministicOutput attr for equal Node test * Change mx.gpu() to mx.cpu() in tests * Skip CSE test on Windows (as env variable setting during test does not work there) * Add missing import sys * Add missing import logging * Backport of #16711, #16737, #16408 to 1.6 branch (#16763) * support mixed-precision true_divide (#16711) * [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737) * use dim_t instead of int * fix same issue in pooling * rebase code * trigger CI * Add MXNet Ops for fast multihead attention (#16408) * add MXNet Ops for fast multihead attention * add cutlass as 3rdparty dependency * add cutlass to compilation flags * remove all cutlass stuff * add better error message and description and remove cutlass from compilation flags * change credit for the approach since the code have changed * fix typos * correct another typo * Add all the cuda/cublas helper functions * remove tests using kAddTo * only use cublasStridedBatchedGemm if CUDA >= 9.1 * add equivalent mxnet code in description of mha ops * remove a wrong copy-paste * add _contrib for namespace and add GPU only on description * add warning in bwd_ignore_zero_init description, also test with fp32 * add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO * remove std::move for clang * remove bwd_ignore_zero_init flag * remove bwd_ignore_zero_init in test_operator_gpu.py * fix typo * fix another typo * Removed unrelated test * Add example and documentation for multi threaded inference * Add LICENSE * Add get_model.py * Add license for README * Refactor cached op and cached op threadsafe * Add limitation * Add tests for naive engine * Add latest test changes * Thread Safety tests in NaiveEngine mode * Thread Safety tests update * Update thread safety tests, add unsupported use cases * Changes to doc and refactor * Fix todo owner, indentation and mx_float->float * Refactor cached op code, remove num_threads arg from example * Fix lint * Fix warning * Add back cython, required for unix-gpu build * Fix for windows * Add bulking support for thread safe cached op version * Add support for subgraph testing * import mxnet before calling get_backend_symbol * Fix symbol json name * Refactor DynamicForward * Add comments * Add DMLC_ATTRIBUTE_UNUSED * Fix use_naive_run issue * Fix lint * Revert unittest_cpp to old test since it doesnt test thread safety * Fix doc Co-authored-by: Sheng Zha <szha@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com> Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Zhennan Qin <zhennan.qin@intel.com> 2020-02-01 09:36:59 -08:00			`for (int i = 0; i < tmp1.shape().Size(); i++) {`
			`float abs_err = fabs((d1[i]) - (d2[i]));`
Enable MKL-DNN FullyConnected backward (#17318) * fix mkldnn fc bwd bug due to data inplace * enable mkldnn fc bwd * fix cpp tests * try: fix random seed * fix cpp test * loose rtol for fc cpp test * improve error message * limit max value for mkldnn tensors * limit the max value of test tensors * fix lint * remove fixed random seed * address review comments * Revert "address review comments" This reverts commit 56d873f2c3b701470088a7c16a6bfe644372e1f5. Co-authored-by: rongzha1 <rong.a.zhang@intel.com> 2020-02-19 10:49:32 +08:00			`ASSERT_LE(abs_err, (atol + rtol * fabs(d2[i])))`
			`<< "index: " << i << ", " << d1[i] << " vs " << d2[i];`
Multithreaded Inference Support (#16654) * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Add CI changes * Add stage Fix indentation * Fix lint * Change to DEFAULT for C API * Fix mxnet_unit_tests path * export correct LD_LIBRARY_PATH * Add cpp include dirs * Build test with USE_CPP_PACKAGE * Add cached op threadsafe version with corresponding C APIs, CPP Package changes, CI changes and tests * Fix download cmd in runtime_functions * Merge * change mkldnn lib name * Add static_alloc, static_Shape support * Address review comments * Make GetCachedOpThreadSafeState similar to cached_op * Address review comments: comments for locking strategy * multithreaded inference tutorial * [Estimator] handle composite metrics in estimator (#16676) * handle composite metrics in estimator * fix composite metric case in handlers * remove unused import * [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678) * refactor estimator to allow overriding evaluate/fit of a batch * add doc to explain call structure and how to override * fix and doc * Pointwise fusion for GPU (#15167) * Beginning of RTC of pointwise ops * Code generation from the given JSON * add initial simple_partition_pass and use it for pointwise fusion * fix the fusion, use a symbol.Copy() at the beginning of binding function, use the name of input nodes in the cuda code * Fixes * Adding support for attribute inference for backward nodes when fusing * keep proper input ordering for fused Op * instantiate the indexed_graph before starting the subgraph replacement, return a new graph to reset the indexed_graph * Fuse backward * fix ordering of subgraph node inputs using subgraph topological ordering instead of main graph topological ordering, add tvm.patch * excluse forward node fusion during the fusion of the nodes in the backward graph * Dealing with fused backward nodes inferattr * use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes node_id, invert control_deps loop to modify topology of subgraph before calling its indexed_graph(), check that all node of the first DFSVisit are actually in the subgraph * Adding support for other reqs in codegen * Fix * Cleaning * Change the TVM submodule * More cleaning * Making linter happy * Do fusion only if default context is GPU * Fixes for tests Add powerscalar and rpowerscalar, fix return type of zero and one Cleaning, fixing lint Go back to proper TVM submodule * Fix the TVM commit * Fix lint * Guard fusion with MXNET_USE_CUDA * Fix * Fix clang-tidy * Add erf and erfinv backward * Gluon support for fusion * Cleaning * Cleaning and allow shape/type change in FusedOp * Fixing Gluon bugs * Fixing after rebase * Fixing race condition and guarding against races when using NVRTC * Cleaning and renaming FusedOp to _FusedOp * Going easy on Windows compiler * Disable fusion on Windows for now * Refactor InferAttr and InferShapeAttr * Added slice and half2 support to FusedOp * Fix lint errors * Added multiple types support for vector loading/storing * add slice fusion when it's at the beginning of subgraphs * Removed constant ndim assumption in fused op * Fix memory alignment issue in slice for FusedOp * Fixes * Fix lint errors * Do not include cuda_fp16.h * Refactor fused op op lists * Make linter happy * Changes from review * Fixes after rebase * Expand FusedOp support for slice * Fix for fp16 _zeros and _ones * Fix * Moving aux functions to unnamed namespace and detail namespace -> fusion namespace * Disabling fusion if it alters topological order of inputs * Print code only when env variable is set * Fix * Fix lint and 2 tests that specify the same names for multiple inputs * Fixes from review and disabling fusion of slice with non-default step * Add amp_cast to fusion, fixes * Add amp_multicast and its backward to the list of support ops * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Apply wording suggestions from code review Co-Authored-By: Aaron Markham <markhama@amazon.com> * Make clearer comment * Adding punctuation and capitalization to \brief descriptions * Fix * Fix * Add backward_cast to fusion * Adding unittests for fusion. Fix for erfinv_grad * Adding slice ops and add_n to tests * Fixes from review * Setting inplace option * Fix lint * Storing double in half * Retrigger CI * Slight relaxing of the relative tolerance in the test * Move the env variable check to the end * Fix a race condition between InferShape and scheduled Forward * Fix flakey test_fusion test involving fp32 erfinv op. * Fix from review * Added broadcast_like and slice_like to fused op * Minor fix and cleanup * Added negative axis support in slice_axis, temporarily disabled fusion of slice_like and broadcast_like * Added axes support to slice_like * Added axis support to broadcast_like * Add fast_load_slice function to fused op code * Added runtime switch for choosing fast and slow slice kernel * Fix lint and warning * Going easy on Windows compiler (again) * Fix slice_like * Debug broadcast_like fusion * Fix lint * Fix lint * Trigger CI * Get rid of the initializer list * Fix backward calls with different gradient type * avoid cycle when adding node specific for inputs of subgraph for pointwise fusion * Fix lint * Add namespace to the fusion implementations * Set launch bounds on the fused kernel * Fix NumPy tests * Test showcasing an issue fixed in PR #16553 * Cast scalarts to FP32 and perform (a1.0/b) instead of (a/b) Fix lint errors Fix lint Fix a bug in cycle detection for inputs only op in pointwise fusion * Add comments to simple_partition_pass.h file * fix install dir (#16690) * [numpy] add numpy operator : append (#16564) * add operator : append ; fix op concatenate when axis = None * pylint disable remove mistake disable pylint * Initializer.__eq__ (#16680) * fix binary dependencies in CD and nightly (#16693) * [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688) * add mxnet mkldnn cmake instruction * imporve doc * OMP->OpenMP * Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697) This reverts commit dd4eaf5c23046d07a4578a219e2dd3622e5620fa. * [Estimator] refactor estimator and clarify docs (#16694) * refactor estimator and clarify docs * fix info message and test * clean up after releasing logging handler * Eliminate common expressions (#15657) * Eliminate common expressions from a graph * Guarding against optimizing out stateful ops and ops that require resource * Fix lint * Added THasDeterministicOutput to multiple ops * DDebug eliminate common expr * Added test * Expose get_optimized_symbol * Fix * Fix 2 * Add doc to the Python call * Add env var MXNET_ELIMINATE_COMMON_EXPR, default true * Add comments, improve readability of eliminate_common_expr_pass.cc * Expand testing * Lower priority of THasDeterministicOutput attr for equal Node test * Change mx.gpu() to mx.cpu() in tests * Skip CSE test on Windows (as env variable setting during test does not work there) * Add missing import sys * Add missing import logging * Backport of #16711, #16737, #16408 to 1.6 branch (#16763) * support mixed-precision true_divide (#16711) * [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737) * use dim_t instead of int * fix same issue in pooling * rebase code * trigger CI * Add MXNet Ops for fast multihead attention (#16408) * add MXNet Ops for fast multihead attention * add cutlass as 3rdparty dependency * add cutlass to compilation flags * remove all cutlass stuff * add better error message and description and remove cutlass from compilation flags * change credit for the approach since the code have changed * fix typos * correct another typo * Add all the cuda/cublas helper functions * remove tests using kAddTo * only use cublasStridedBatchedGemm if CUDA >= 9.1 * add equivalent mxnet code in description of mha ops * remove a wrong copy-paste * add _contrib for namespace and add GPU only on description * add warning in bwd_ignore_zero_init description, also test with fp32 * add error return if bwd_ignore_zero_init is used without MXNET_EXEC_ENABLE_ADDTO * remove std::move for clang * remove bwd_ignore_zero_init flag * remove bwd_ignore_zero_init in test_operator_gpu.py * fix typo * fix another typo * Removed unrelated test * Add example and documentation for multi threaded inference * Add LICENSE * Add get_model.py * Add license for README * Refactor cached op and cached op threadsafe * Add limitation * Add tests for naive engine * Add latest test changes * Thread Safety tests in NaiveEngine mode * Thread Safety tests update * Update thread safety tests, add unsupported use cases * Changes to doc and refactor * Fix todo owner, indentation and mx_float->float * Refactor cached op code, remove num_threads arg from example * Fix lint * Fix warning * Add back cython, required for unix-gpu build * Fix for windows * Add bulking support for thread safe cached op version * Add support for subgraph testing * import mxnet before calling get_backend_symbol * Fix symbol json name * Refactor DynamicForward * Add comments * Add DMLC_ATTRIBUTE_UNUSED * Fix use_naive_run issue * Fix lint * Revert unittest_cpp to old test since it doesnt test thread safety * Fix doc Co-authored-by: Sheng Zha <szha@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Tao Lv <tao.a.lv@intel.com> Co-authored-by: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Zhennan Qin <zhennan.qin@intel.com> 2020-02-01 09:36:59 -08:00			`}`
			`}`
			`}`

Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled 2017-05-15 20:27:28 -07:00			`} // namespace test`
			`} // namespace mxnet`

Set worker thread to use OMP when necessary (and not to when not nece… (#9801) * Set worker thread to use OMP when necessary (and not to when not necessary) * No change to omp config if environment var is set * Unit test for omp changes 2018-02-26 10:09:45 -08:00			`#if defined(_MSC_VER)`
			`inline void usleep(__int64 usec) {`
			`HANDLE timer;`
			`LARGE_INTEGER ft;`

			`// Convert to 100 nanosecond interval, negative value indicates relative time`
[master][clang-format] Re-format cc. .h. .cu files; cond. (#20704) * [SRC] Re-format .cc .h files * [TEST] Re-format .cc .h files * [INCLUDE] Re-format .cc .h files * [CPP-PACKAGE] Re-format .cc .h files * [EXAMPLE] Re-format .cc .h files * [PLUGIN] Re-format .cc .h files * [TOOLS] Re-format .cc .h files * Clang-format fix * Sanity-cpp fix * Sanity-cpp fix part2 2021-11-19 09:27:00 +01:00			`ft.QuadPart = -(10 * usec);`
Set worker thread to use OMP when necessary (and not to when not nece… (#9801) * Set worker thread to use OMP when necessary (and not to when not necessary) * No change to omp config if environment var is set * Unit test for omp changes 2018-02-26 10:09:45 -08:00
			`timer = CreateWaitableTimer(NULL, TRUE, NULL);`
			`SetWaitableTimer(timer, &ft, 0, NULL, NULL, 0);`
			`WaitForSingleObject(timer, INFINITE);`
			`CloseHandle(timer);`
			`}`
			`#endif // _WIN32`

Refactor Stateful operator and custom op (#6928) * refactor create layer * fix * refactor custom op * fix * fix * fix * fix * fix OpState * remove superfluous infershape * fix * fix * fix lint * fix * fix * fix * Update CMakeLists.txt * delete * fix * fix scala 2017-07-12 10:04:40 -07:00			`#endif // TEST_UTIL_H_`