SIGN IN SIGN UP
apache / mxnet UNCLAIMED

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

0 0 1 C++
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Extend
======
The following tutorials will help you learn how to customize MXNet.
.. container:: cards
.. card::
:title: Custom Layers for Gluon
:link: ../packages/gluon/blocks/custom-layer.html
How to add new layer functionality to MXNet's imperative interface.
.. card::
:title: Custom Loss
:link: ../packages/gluon/loss/custom-loss.html
A guide to implementing custom losses.
.. card::
:title: Custom Operators Using Numpy
:link: customop.html
How to use Numpy to create custom MXNet operators.
.. card::
:title: New Operator Creation
:link: /api/faq/new_op
How to create new MXNet operators using CustomOp (Python) or NNVM (C++).
.. card::
:title: A Beginners Guide to Implementing Operators in MXNet Backend
:link: /api/faq/add_op_in_backend
How to create new MXNet operators in MXNet's backend using C++.
An example custom quadratic function op.
Use RTC for elementwise and broadcast ops (#18622) * Reapplying PR #17767 * Making RTC required * Move cuda utils to src/common/cuda and refactor RTC part * Unary ops via RTC * Support binary_scalar forward Remove elemwise_scatter_op.* Fix BinaryScalar usage in NumPy * Backward of binary scalar * Binary forward * Fix for binary_scalar * Moving all binary forward to RTC Reorganization * Backward of binary ops * Suuport broadcast Add RTC to NumPy ops * RTC for elementwise sum Fixes * RTC for backward usenone of broadcast * RTC for broadcast bwd usein * Remove non-RTC vectorization support * Remove template from ReduceWorkspaceSize * Fixes from rebase * Guarding RTC usage behing MXNET_USE_CUDA * More guards * C++17 for CUDA code * MixedUnaryBackwardInOut as RTC * Removing unused variable * Revert "C++17 for CUDA code" This reverts commit b09090ca4564a3e76367ffeb8ade45f521d24482. * Get rid of CI tests without RTC Get rid of if constexpr as CUDA 10 does not support it * Fix lint * Change a few more elemwise functions Fix for too long value * Fix large tensor build * Another try with DBL_MAX * Fix Windows compilation * Fix the large int test * Add the printing of error code value to CUDA_DRIVER_CALL * Fix * Fix binary scalar * Get more information when cuLaunchKernel fails * Going easy on Windows compiler * Fix lint * Reorganization to split strings due to Windows compilation problems * Fix error with uninitialized value * Fix handling of different types for backward of binary scalar * Decreasing RTC overhead * Fix lint and remove rest of mentions of ENABLE_RTC * Jetson with RTC * Fix the aws s3 command * Debugging Windows failure * More debugging of Windows failure * Debug * Fix the issue on Windows (long -> long long for 8B) * libcuda.so for Jetson * Enable debug information for RTC kernels and cleaning debug ptx dump * Fix lint * Try without linking the stub of libcuda.so to different place in Jetson * Add docstring * Answering review comments * Unifying vectorization * Fix * Fixes for reduce ops * Fix M=1 case * Fixes from rebase Fixes for mixed type gradient functions Set the launch bounds on RTC kernels * Fix * Fix tests * Adding tutorial for RTC * Fixes after merge * Fixes from review * Change env var doc and undo the change to toctree
2020-08-20 14:14:03 -07:00
.. card::
:title: Using runtime compilation (RTC) to write CUDA kernels in MXNet
:link: /api/faq/using_rtc
How to write CUDA kernels in MXNet using runtime compilation.
.. toctree::
:hidden:
:glob:
*
New Operator Creation <https://mxnet.apache.org/api/faq/new_op>
New Operator in MXNet Backend <https://mxnet.apache.org/api/faq/add_op_in_backend>
Use RTC for elementwise and broadcast ops (#18622) * Reapplying PR #17767 * Making RTC required * Move cuda utils to src/common/cuda and refactor RTC part * Unary ops via RTC * Support binary_scalar forward Remove elemwise_scatter_op.* Fix BinaryScalar usage in NumPy * Backward of binary scalar * Binary forward * Fix for binary_scalar * Moving all binary forward to RTC Reorganization * Backward of binary ops * Suuport broadcast Add RTC to NumPy ops * RTC for elementwise sum Fixes * RTC for backward usenone of broadcast * RTC for broadcast bwd usein * Remove non-RTC vectorization support * Remove template from ReduceWorkspaceSize * Fixes from rebase * Guarding RTC usage behing MXNET_USE_CUDA * More guards * C++17 for CUDA code * MixedUnaryBackwardInOut as RTC * Removing unused variable * Revert "C++17 for CUDA code" This reverts commit b09090ca4564a3e76367ffeb8ade45f521d24482. * Get rid of CI tests without RTC Get rid of if constexpr as CUDA 10 does not support it * Fix lint * Change a few more elemwise functions Fix for too long value * Fix large tensor build * Another try with DBL_MAX * Fix Windows compilation * Fix the large int test * Add the printing of error code value to CUDA_DRIVER_CALL * Fix * Fix binary scalar * Get more information when cuLaunchKernel fails * Going easy on Windows compiler * Fix lint * Reorganization to split strings due to Windows compilation problems * Fix error with uninitialized value * Fix handling of different types for backward of binary scalar * Decreasing RTC overhead * Fix lint and remove rest of mentions of ENABLE_RTC * Jetson with RTC * Fix the aws s3 command * Debugging Windows failure * More debugging of Windows failure * Debug * Fix the issue on Windows (long -> long long for 8B) * libcuda.so for Jetson * Enable debug information for RTC kernels and cleaning debug ptx dump * Fix lint * Try without linking the stub of libcuda.so to different place in Jetson * Add docstring * Answering review comments * Unifying vectorization * Fix * Fixes for reduce ops * Fix M=1 case * Fixes from rebase Fixes for mixed type gradient functions Set the launch bounds on RTC kernels * Fix * Fix tests * Adding tutorial for RTC * Fixes after merge * Fixes from review * Change env var doc and undo the change to toctree
2020-08-20 14:14:03 -07:00
Using RTC for CUDA kernels <https://mxnet.apache.org/api/faq/using_rtc>