# Optimization: initialize and update weights ## Overview This document summaries the APIs used to initialize and update the model weights during training ```eval_rst .. autosummary:: :nosignatures: mxnet.initializer mxnet.optimizer mxnet.lr_scheduler ``` and how to develop a new optimization algorithm in MXNet. Assume there there is a pre-defined ``Symbol`` and a ``Module`` is created for it ```python >>> data = mx.symbol.Variable('data') >>> label = mx.symbol.Variable('softmax_label') >>> fc = mx.symbol.FullyConnected(data, name='fc', num_hidden=10) >>> loss = mx.symbol.SoftmaxOutput(fc, label, name='softmax') >>> mod = mx.mod.Module(loss) >>> mod.bind(data_shapes=[('data', (128,20))], label_shapes=[('softmax_label', (128,))]) ``` Next we can initialize the weights with values sampled uniformly from ``[-1,1]``: ```python >>> mod.init_params(mx.initializer.Uniform(scale=1.0)) ``` Then we will train a model with standard SGD which decreases the learning rate by multiplying 0.9 for each 100 batches. ```python >>> lr_sch = mx.lr_scheduler.FactorScheduler(step=100, factor=0.9) >>> mod.init_optimizer( ... optimizer='sgd', optimizer_params=(('learning_rate', 0.1), ('lr_scheduler', lr_sch))) ``` Finally run ``mod.fit(...)`` to start training. ## The ``mxnet.initializer`` package ```eval_rst .. currentmodule:: mxnet.initializer ``` The base class ``Initializer`` defines the default behaviors to initialize various parameters, such as set bias to 1, except for the weight. Other classes then defines how to initialize the weight. ```eval_rst .. autosummary:: :nosignatures: Initializer Uniform Normal Load Mixed Zero One Constant Orthogonal Xavier MSRAPrelu Bilinear FusedRNN ``` ## The ``mxnet.optimizer`` package ```eval_rst .. currentmodule:: mxnet.optimizer ``` The base class ``Optimizer`` accepts commonly shared arguments such as ``learning_rate`` and defines the interface. Each other class in this package implements one weight updating function. ```eval_rst .. autosummary:: :nosignatures: Optimizer SGD NAG RMSProp Adam AdaGrad AdaDelta Adamax Nadam DCASGD SGLD Signum FTML LBSGD Ftrl ``` ## The ``mxnet.lr_scheduler`` package ```eval_rst .. currentmodule:: mxnet.lr_scheduler ``` The base class ``LRScheduler`` defines the interface, while other classes implement various schemes to change the learning rate during training. ```eval_rst .. autosummary:: :nosignatures: LRScheduler FactorScheduler MultiFactorScheduler ``` ## Implement a new algorithm Most classes listed in this document are implemented in Python by using ``NDArray``. So implementing new weight updating or initialization functions is straightforward. For `initializer`, create a subclass of ``Initializer`` and define the `_init_weight` method. We can also change the default behaviors to initialize other parameters such as `_init_bias`. See [`initializer.py`](https://github.com/dmlc/mxnet/blob/master/python/mxnet/initializer.py) for examples. For ``optimizer``, create a subclass of ``Optimizer`` and implement two methods ``create_state`` and ``update``. Also add ``@mx.optimizer.Optimizer.register`` before this class. See [`optimizer.py`](https://github.com/dmlc/mxnet/blob/master/python/mxnet/optimizer.py) for examples. For `lr_scheduler`, create a subclass of `LRScheduler` and then implement the `__call__` method. See [`lr_scheduler.py`](https://github.com/dmlc/mxnet/blob/master/python/mxnet/lr_scheduler.py) for examples. ## API Reference ```eval_rst .. automodule:: mxnet.optimizer :members: .. automodule:: mxnet.lr_scheduler :members: .. automodule:: mxnet.initializer :members: ```