Neural Network Libraries¶
Neural Network Libraries is deep learning framework that is intended to be used for research, development, and production. We aim it running everywhere like desktop PCs, HPC clusters, embedded devices and production servers.
This document describes how to use the Python API and C++ API, the contribution guide for developers, and the license term of this software. The Python API is more suitable for fast prototyping and experimentation of deep learning systems, while the C++ API is for deploying inference or training algorithms into embedded systems and servers (The documentation is not available so far. We will make it available soon). The framework is designed modularity and extensibility in mind. Community contributors can add a new operator or optimizer module of neural networks, and a specialized implementation of neural network modules for a specific target device as an extension.
Python Package¶
The Python API built on top of our C++11 core maximizes the flexibility of the design of neural networks , and encourages fast prototyping and experimentation. NNabla works on both Python>=2.7 and Python>=3.5.
Python Package Installation¶
There are three ways to install NNabla Python package.
Install with pip command¶
The NNabla python packages are hosted on PYPI for many platforms. For people who are familiar with Python and its package management system pip
(and optionally CUDA, but recommended), the following pip installation guide will be satisfactory when you install NNabla Python. To see the a bit more detailed OS specific setup guide, go to the next section.
NNabla package installation using PIP¶
Note: please refer to the OS specific workflows for the OS specific dependencies setup.
Install NNabla package via pip:
pip install nnabla
Note: If you want to make sure the latest version will be installed, try to uninstall previously installed one with pip uninstall y nnabla
beforehand.
Then, check if it works by running:
python c "import nnabla"
20180626 15:20:16,759 [nnabla][INFO]: Initializing CPU extension...
NNabla CUDA extension package installation¶
Run an Example¶
Get the examples (, and unzip) or clone NNabla Examples repository, and go to the MNIST folder.
cd nnablaexamples/mnistcollection/
Run MNIST classification.
python classification.py
Run MNIST classification with CUDA/cuDNN.
python classification.py c cudnn
OS specific workflows¶
Installation on Linux¶
This installation instruction describes how to install NNabla using pip on almost any Linux 64bit systems.
The supported Python versions for provided binary packages are 2.7, 3.5 3.6. It is recommended to use Miniconda as a Python distribution. The following is a simple procedure to install Miniconda Python.
wget https://repo.continuum.io/miniconda/Miniconda3latestLinuxx86_64.sh
bash Miniconda3latestLinuxx86_64.sh b p {installation path e.g. ~/miniconda}
# You have to set an environment variable PATH accordingly
# to enable the installed ``Python`` and the ``conda`` system.
echo 'export PATH=<installation path>/bin:$PATH' >> ~/.bashrc
# Restart your bash or source ~/.bashrc
# Switch the default Python version
conda install y python={version number e.g. 3.6}
Use libgcc 5 and numpy 1.13.0 or the greater, and note that numba depends on the older numpy so please uninstall numba first (The following is for Python2).
conda create n py2 python=2.7 anaconda # if necessary
source activate py2
conda install libgcc
conda install c anaconda numpy=1.13.0
Then, you can follow the usual installation workflow.
We actually tested other linux distributions and versions; Ubuntu 14.04, CentOS 6.9, 7.3, Fedora 23, 25, 26, and RHEL 7.3 on various environments; Baremetal server, AWS instance, and/or Docker machine. Thus, you can install in almost the same way described here. The details of howtoinstall for each are coming soon.
Installation on Windows¶
We tested on Windows8.1 64bit and Windows10 64bit.
The following software are required for installation:
 Required software.
 Python 2.7 or Python>=3.5: PIP
 Microsoft Visual C++ 2015 Redistributable
 Recommended.
 CUDA Toolkit and CUDNN (if you have CUDA GPUs).
In this instruction, we use miniconda.
Get and install the windows binary from here
And then install required packages from command prompt.
> conda install scipy scikitimage ipython
If your network is using proxy and setup fails, configure proxy server with environment variable and try install again.
> SET HTTP_PROXY=http://(enter the address of the http proxy server here)
> SET HTTPS_PROXY=https://(enter the address of the https proxy server here)
If you are using a NVIDIA GPU, execution speed will be drastically improved by installing the following software.
To install cuDNN, copy bin, include and lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v{CUDA_VERSION}
See a list of compatible CUDNN versions of CUDA extension packages.
Installation on macOS¶
NOTE: Our testing coverage in terms of environments and machines on macOS is very limited. Please submit an issue if you face any issue.
We test the installation on macOS Sierra.
The following software are required for installation:
 Python 2.7 or Python>=3.5 (We’d recommend you to setup Python using Anaconda or Miniconda).
 pip (bundled in Conda Python)
 wheel (bundled in Conda Python)
 setuptools (bundled in Conda Python. You may need to upgrade the version of setuptools with
pip install U nodeps setuptools
.)
See NNabla package installation using PIP (note that the binary packages for the CUDA extension are not available for macOS. Please build it from source).
Install NNabla package compatible with MultiGPU execution¶
To enable multiGPU execution such as distributed training on NNabla, you have to install a special edition of NNabla package. See Installation with MultiGPU supported for installation.
Install from source¶
Documentation of build from source has been moved to Github repository (build or build_distributed).
Python API Tutorial¶
The following tutorial documents are automatically generated from Jupyter notebook files listed in NNabla Tutorial. If you want to run these stepbystep, follow the link and see the instruction found there.
NNabla by Examples¶
This tutorial demonstrates how you can write a script to train a neural network by using a simple hand digits classification task.
Note: This tutorial notebook requires scikitlearn and matplotlib installed in your Python environment.
First let us prepare some dependencies.
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
from nnabla.monitor import tile_images
import numpy as np
import matplotlib.pyplot as plt
import tiny_digits
%matplotlib inline
np.random.seed(0)
imshow_opt = dict(cmap='gray', interpolation='nearest')
20170626 23:09:49,971 [nnabla][INFO]: Initializing CPU extension...
The tiny_digits
module is located under this folder. It provides
some utilities for loading a handwrittendigit classification dataset
(MNIST) available in scikitlearn.
Logistic Regression¶
We will first start by defining a computation graph for logistic regression. (For details on logistic regression, see Appendix A.)
The training will be done by gradient descent, where gradients are calculated using the error backpropagation algorithm (backprop).
Preparing a Toy Dataset¶
This section just prepares a dataset to be used for demonstration of NNabla usage.
digits = tiny_digits.load_digits(n_class=10)
tiny_digits.plot_stats(digits)
Num images: 1797
Image shape: (8, 8)
Labels: [0 1 2 3 4 5 6 7 8 9]
The next block creates a dataset loader which is a generator providing images and labels as minibatches. Note that this dataset is just an example purpose and not a part of NNabla.
data = tiny_digits.data_iterator_tiny_digits(digits, batch_size=64, shuffle=True)
20170626 23:09:50,545 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:09:50,546 [nnabla][INFO]: Using DataSourceWithMemoryCache
20170626 23:09:50,546 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:09:50,547 [nnabla][INFO]: Onmemory
20170626 23:09:50,547 [nnabla][INFO]: Using DataIterator
A minibatch is as follows. img
and label
are in
numpy.ndarray
.
img, label = data.next()
plt.imshow(tile_images(img), **imshow_opt)
print("labels: {}".format(label.reshape(8, 8)))
print("Label shape: {}".format(label.shape))
labels: [[ 2. 8. 2. 6. 6. 7. 1. 9.]
[ 8. 5. 2. 8. 6. 6. 6. 6.]
[ 1. 0. 5. 8. 8. 7. 8. 4.]
[ 7. 5. 4. 9. 2. 9. 4. 7.]
[ 6. 8. 9. 4. 3. 1. 0. 1.]
[ 8. 6. 7. 7. 1. 0. 7. 6.]
[ 2. 1. 9. 6. 7. 9. 0. 0.]
[ 5. 1. 6. 3. 0. 2. 3. 4.]]
Label shape: (64, 1)
Preparing the Computation Graph¶
NNabla provides two different ways for backpropbased gradient descent optimization. One is with a static graph, and another is with a dynamic graph. We are going to show a static version first.
# Forward pass
x = nn.Variable(img.shape) # Define an image variable
with nn.parameter_scope("affine1"):
y = PF.affine(x, 10) # Output is 10 class
This code block shows one of the most important features in graph
building in NNabla, the parameter scope. The first line defines an
input variable x
. The second line creates a parameter scope. The
third line then applies PF.affine
 an affine transform  to x
,
and creates a variable y
holding that result. Here, the PF
(parametric_function) module provides functions that contain learnable
parameters, such as affine transforms (which contains weights),
convolution (which contains kernels) and batch normalization (which
contains transformation factors and coefficients). We will call these
functions as parametric functions. The parameters are created and
initialized randomly at function call, and registered by a name
“affine1” using parameter_scope
context.
# Building a loss graph
t = nn.Variable(label.shape) # Define an target variable
loss = F.mean(F.softmax_cross_entropy(y, t)) # Softmax Xentropy fits multiclass classification problems
The remaining lines shown above define a target variable and attach functions for loss at the end of the graph. Note that the static graph build doesn’t execute any computation, but the shapes of output variables are inferred. Therefore, we can inspect the shapes of each variable at this time:
print("Printing shapes of variables")
print(x.shape)
print(y.shape)
print(t.shape)
print(loss.shape) # empty tuple means scalar
Printing shapes of variables
(64, 1, 8, 8)
(64, 10)
(64, 1)
()
Executing a static graph¶
You can execute the computation of the graph by calling the
forward()
method in a sink variable. Inputs can be set via .d
accessor. It will borrow CPU array references as numpy.ndarray
.
# Set data
x.d = img
t.d = label
# Execute a forward pass
loss.forward()
# Showing results
print("Prediction score of 0th image: {}".format(y.d[0]))
print("Loss: {}".format(loss.d))
Prediction score of 0th image: [ 9.75851917 6.49118519 16.47323608 1.36296904 0.78583491
4.08872032 7.84134388 2.42956853 3.31485462 3.61868763]
Loss: 10.6016616821
The output doesn’t make sense since the network is just randomly initialized.
Backward propagation through the graph¶
The parameters registered by parameter_scope
management function can
be queried by get_parameters()
as a dict format.
print(nn.get_parameters())
OrderedDict([('affine1/affine/W', <Variable((64, 10), need_grad=True) at 0x7fa0ba361d50>), ('affine1/affine/b', <Variable((10,), need_grad=True) at 0x7fa0ba361ce8>)])
Before executing backpropagation, we should initialize gradient buffers of all parameter to zeros.
for param in nn.get_parameters().values():
param.grad.zero()
Then, you can execute backprop by calling backward()
method at the
sink variable.
# Compute backward
loss.backward()
# Showing gradients.
for name, param in nn.get_parameters().items():
print(name, param.shape, param.g.flat[:20]) # Showing first 20.
affine1/affine/W (64, 10) [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 4.98418584e02 8.72317329e03
4.06671129e02 4.68742661e02 2.52632981e09 7.86017510e04
9.06870365e02 1.56249944e02 1.56217301e02 3.12499963e02]
affine1/affine/b (10,) [ 0.42710391 0.01852455 0.07369987 0.04687012 0.07798236 0.03664626
0.01651323 0.1249291 0.11862005 0.09374455]
Gradient is stored in grad field of Variable
. .g
accessor can be
used to access grad data in numpy.ndarray
format.
Optimizing parameters (=Training)¶
To optimize parameters, we provide solver module (aliased as S here). The solver module contains a bunch of optimizer implementations such as SGD, SGD with momentum, Adam etc. The below block creates SGD solver and sets parameters of logistic regression to it.
# Create a solver (gradientbased optimizer)
learning_rate = 1e3
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
In the next block, we demonstrate a single step of optimization loop.
solver.zero_grad()
line does equivalent to calling .grad.zero()
for all parameters as we shown above. After backward computation, we
apply weight decay, then applying gradient descent implemented in Sgd
solver class as follows
where \(\eta\) denotes learning rate.
# One step of training
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
print(loss.d)
12.9438686371
Next block iterates optimization steps, and shows the loss decreases.
for i in range(1000):
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print(i, loss.d)
0 12.6905069351
100 3.17041015625
200 1.60036706924
300 0.673069953918
400 0.951370298862
500 0.724424362183
600 0.361597299576
700 0.588107347488
800 0.28792989254
900 0.415006935596
Show prediction¶
The following code displays training results.
x.d, t.d = data.next() # Here we predict images from training set although it's useless.
y.forward() # You can execute a sub graph.
plt.imshow(tile_images(x.d), **imshow_opt)
print("prediction:")
print(y.d.argmax(axis=1).reshape(8, 8)) # Taking a class index based on prediction score.
prediction:
[[5 0 1 9 0 1 3 3]
[2 4 1 7 4 5 6 5]
[7 7 9 7 9 0 7 3]
[5 3 7 6 6 8 0 9]
[0 1 3 5 5 5 4 9]
[1 0 0 8 5 1 8 8]
[7 5 0 7 6 9 0 0]
[0 6 2 6 4 4 2 6]]
Dynamic graph construction support¶
This is another way of running computation graph in NNabla. This example doesn’t show how useful dynamic graph is, but shows a bit of flavor.
The next block just define computation graph building as functions for later use.
def logreg_forward(x):
with nn.parameter_scope("affine1"):
y = PF.affine(x, 10)
return y
def logreg_loss(y, t):
loss = F.mean(F.softmax_cross_entropy(y, t)) # Softmax Xentropy fits multiclass classification problems
return loss
To run a computation graph dynamically during creation, you use
nnabla.auto_forward()
context as you see in the below block. By
this, computation is fired immediately at functions are called. (You can
also use nnabla.set_auto_forward(auto)
to set the autoforward state
globally.)
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)
x.d, t.d = data.next()
with nn.auto_forward(): # Graph are executed
y = logreg_forward(x)
loss = logreg_loss(y, t)
print("Loss: {}".format(loss.d))
plt.imshow(tile_images(x.d), **imshow_opt)
print("prediction:")
print(y.d.argmax(axis=1).reshape(8, 8))
Loss: 0.43071603775
prediction:
[[9 3 5 0 1 9 9 2]
[5 6 6 2 7 5 1 1]
[3 7 7 6 0 8 3 8]
[0 6 4 6 0 6 9 9]
[6 1 2 5 8 3 2 4]
[1 4 4 0 5 7 1 7]
[7 8 9 5 8 3 7 8]
[5 7 5 3 3 0 0 7]]
Backward computation can be done on a dynamically constructed graph.
solver.zero_grad()
loss.backward()
MultiLayer Perceptron (MLP)¶
In this section, you see an example of MLP graph building and training.
Before starting, we clear all parameters registered in the logistic regression example.
nn.clear_parameters() # Clear all parameters
Here is the function that builds a MLP with an arbitrary depth and width for 10 class classification.
def mlp(x, hidden=[16, 32, 16]):
hs = []
with nn.parameter_scope("mlp"): # Parameter scope can be nested
h = x
for hid, hsize in enumerate(hidden):
with nn.parameter_scope("affine{}".format(hid + 1)):
h = F.tanh(PF.affine(h, hsize))
hs.append(h)
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y, hs
# Construct a MLP graph
y, hs = mlp(x)
print("Printing shapes")
print("x: {}".format(x.shape))
for i, h in enumerate(hs):
print("h{}:".format(i + 1), h.shape)
print("y: {}".format(y.shape))
Printing shapes
x: (64, 1, 8, 8)
h1: (64, 16)
h2: (64, 32)
h3: (64, 16)
y: (64, 10)
# Training
loss = logreg_loss(y, t) # Reuse logreg loss function.
# Copied from the above logreg example.
def training(steps, learning_rate):
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
for i in range(steps):
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print(i, loss.d)
# Training
training(1000, 1e2)
0 2.42193937302
100 1.83251476288
200 1.49943637848
300 1.30751883984
400 1.00974023342
500 0.904026031494
600 0.873289525509
700 0.725554704666
800 0.614291608334
900 0.555113613605
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
def scale01(h):
return (h  h.min()) / (h.max()  h.min())
def imshow(img, title):
global gid
plt.subplot(num_plot, 1, gid)
gid += 1
plt.title(title)
plt.imshow(img, **imshow_opt)
plt.axis('off')
plt.figure(figsize=(2, 5))
imshow(x.d[0, 0], 'x')
for hid, h in enumerate(hs):
imshow(scale01(h.d[0]).reshape(1, 8), 'h{}'.format(hid + 1))
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
Convolutional Neural Network with CUDA acceleration¶
Here we demonstrates a CNN with CUDA GPU acceleration.
nn.clear_parameters()
def cnn(x):
with nn.parameter_scope("cnn"): # Parameter scope can be nested
with nn.parameter_scope("conv1"):
c1 = F.tanh(PF.batch_normalization(
PF.convolution(x, 4, (3, 3), pad=(1, 1), stride=(2, 2))))
with nn.parameter_scope("conv2"):
c2 = F.tanh(PF.batch_normalization(
PF.convolution(c1, 8, (3, 3), pad=(1, 1))))
c2 = F.average_pooling(c2, (2, 2))
with nn.parameter_scope("fc3"):
fc3 = F.tanh(PF.affine(c2, 32))
with nn.parameter_scope("classifier"):
y = PF.affine(fc3, 10)
return y, [c1, c2, fc3]
To enable CUDA extension in NNabla, you have to install nnablaextcuda
package first. See the install
guide.
After installing the CUDA extension, you can easily switch to run on
CUDA by specifying a context before building a graph. We strongly
recommend using a CUDNN context that is fast. Although the context class
can be instantiated by nn.Context()
, specifying a context descriptor
might be a bit complicated for users. There for we recommend create a
context by using a helper function get_extension_context()
found in the
nnabla.ext_utils
module. NNabla officially supports cpu
and cudnn
as a context specifier passed to the first argument
(extension name). NOTE: By setting the cudnn context as a global default
context, Functions and solves created are instantiated with CUDNN
(preferred) mode. You can also specify a context using
with nn.context_scope()
. See API
reference
for details.
# Run on CUDA
from nnabla.ext_utils import get_extension_context
cuda_device_id = 0
ctx = get_extension_context('cudnn', device_id=cuda_device_id)
print("Context: {}".format(ctx))
nn.set_default_context(ctx) # Set CUDA as a default context.
y, hs = cnn(x)
loss = logreg_loss(y, t)
20170626 23:09:54,555 [nnabla][INFO]: Initializing CUDA extension...
20170626 23:09:54,731 [nnabla][INFO]: Initializing cuDNN extension...
Context: Context(backend='cpucuda', array_class='CudaCachedArray', device_id='0', compute_backend='defaultcudnn')
training(1000, 1e1)
0 2.34862923622
100 1.00527024269
200 0.416576713324
300 0.240603536367
400 0.254562884569
500 0.206138283014
600 0.220851421356
700 0.161689639091
800 0.230873346329
900 0.121101222932
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
plt.figure(figsize=(2, 8))
imshow(x.d[0, 0], 'x')
imshow(tile_images(hs[0].d[0][:, None]), 'conv1')
imshow(tile_images(hs[1].d[0][:, None]), 'conv2')
imshow(hs[2].d[0].reshape(1, 8), 'fc3')
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
nn.save_parameters
writes parameters registered in
parameter_scope
system in HDF5 format. We use it a later example.
path_cnn_params = "tmp.params.cnn.h5"
nn.save_parameters(path_cnn_params)
20170626 23:09:56,132 [nnabla][INFO]: Parameter save (hdf5): tmp.params.cnn.h5
Recurrent Neural Network (Elman RNN)¶
This is an example of recurrent neural network training.
nn.clear_parameters()
def rnn(xs, h0, hidden=32):
hs = []
with nn.parameter_scope("rnn"):
h = h0
# Time step loop
for x in xs:
# Note: Parameter scopes are reused over time
# which means parameters are shared over time.
with nn.parameter_scope("x2h"):
x2h = PF.affine(x, hidden, with_bias=False)
with nn.parameter_scope("h2h"):
h2h = PF.affine(h, hidden)
h = F.tanh(x2h + h2h)
hs.append(h)
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y, hs
It is not meaningful, but just a demonstration purpose. We split an image into 2 by 2 grids, and feed them sequentially into RNN.
def split_grid4(x):
x0 = x[..., :4, :4]
x1 = x[..., :4, 4:]
x2 = x[..., 4:, :4]
x3 = x[..., 4:, 4:]
return x0, x1, x2, x3
hidden = 32
seq_img = split_grid4(img)
seq_x = [nn.Variable(subimg.shape) for subimg in seq_img]
h0 = nn.Variable((img.shape[0], hidden)) # Initial hidden state.
y, hs = rnn(seq_x, h0, hidden)
loss = logreg_loss(y, t)
# Copied from the above logreg example.
def training_rnn(steps, learning_rate):
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
for i in range(steps):
minibatch = data.next()
img, t.d = minibatch
seq_img = split_grid4(img)
h0.d = 0 # Initialize as 0
for x, subimg in zip(seq_x, seq_img):
x.d = subimg
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print(i, loss.d)
training_rnn(1000, 1e1)
0 2.62527275085
100 0.780260562897
200 0.486522495747
300 0.289345681667
400 0.249717146158
500 0.538961410522
600 0.276877015829
700 0.159639537334
800 0.249660402536
900 0.0925596579909
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
plt.figure(figsize=(2, 8))
imshow(x.d[0, 0], 'x')
for hid, h in enumerate(hs):
imshow(scale01(h.d[0]).reshape(1, 8), 'h{}'.format(hid + 1))
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
Siamese Network¶
This example show how to embed an image in a categorical dataset into 2D space using deep learning. This also demonstrates how to reuse a pretrained network.
First, we load parameters learned in the CNN example.
nn.clear_parameters()
# Loading CNN pretrained parameters.
_ = nn.load_parameters(path_cnn_params)
20170626 23:09:57,838 [nnabla][INFO]: Parameter load (<builtin function format>): tmp.params.cnn.h5
We define embedding function. Note that the network structure and parameter hierarchy is identical to the previous CNN example. That enables you to reuse the saved parameters and finetune from it.
def cnn_embed(x, test=False):
# Note: Identical configuration with the CNN example above.
# Parameters pretrained in the above CNN example are used.
with nn.parameter_scope("cnn"):
with nn.parameter_scope("conv1"):
c1 = F.tanh(PF.batch_normalization(PF.convolution(x, 4, (3, 3), pad=(1, 1), stride=(2, 2)), batch_stat=not test))
with nn.parameter_scope("conv2"):
c2 = F.tanh(PF.batch_normalization(PF.convolution(c1, 8, (3, 3), pad=(1, 1)), batch_stat=not test))
c2 = F.average_pooling(c2, (2, 2))
with nn.parameter_scope("fc3"):
fc3 = PF.affine(c2, 32)
# Additional affine for map into 2D.
with nn.parameter_scope("embed2d"):
embed = PF.affine(c2, 2)
return embed, [c1, c2, fc3]
def siamese_loss(e0, e1, t, margin=1.0, eps=1e4):
dist = F.sum(F.squared_error(e0, e1), axis=1) # Squared distance
# Contrastive loss
sim_cost = t * dist
dissim_cost = (1  t) * \
(F.maximum_scalar(margin  (dist + eps) ** (0.5), 0) ** 2)
return F.mean(sim_cost + dissim_cost)
We build two stream CNNs and compare them with the contrastive loss function defined above. Note that both CNNs have the same parameter hierarchy, which means both parameters are shared.
x0 = nn.Variable(img.shape)
x1 = nn.Variable(img.shape)
t = nn.Variable((img.shape[0],)) # Same class or not
e0, hs0 = cnn_embed(x0)
e1, hs1 = cnn_embed(x1) # NOTE: parameters are shared
loss = siamese_loss(e0, e1, t)
def training_siamese(steps):
for i in range(steps):
minibatchs = []
for _ in range(2):
minibatch = data.next()
minibatchs.append((minibatch[0].copy(), minibatch[1].copy()))
x0.d, label0 = minibatchs[0]
x1.d, label1 = minibatchs[1]
t.d = (label0 == label1).astype(np.int).flat
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print(i, loss.d)
learning_rate = 1e2
solver = S.Sgd(learning_rate)
with nn.parameter_scope("embed2d"):
# Only 2d embedding affine will be updated.
solver.set_parameters(nn.get_parameters())
training_siamese(2000)
# Decay learning rate
solver.set_learning_rate(solver.learning_rate() * 0.1)
training_siamese(2000)
0 0.150528043509
100 0.186870157719
200 0.149316266179
300 0.207163512707
400 0.171384960413
500 0.190256178379
600 0.138507723808
700 0.0918073058128
800 0.159692272544
900 0.0833697617054
1000 0.0839115008712
1100 0.104669973254
1200 0.0776312947273
1300 0.114788673818
1400 0.120309025049
1500 0.107732802629
1600 0.070114441216
1700 0.101728007197
1800 0.114350572228
1900 0.118794307113
0 0.0669310241938
100 0.0553173273802
200 0.0829797014594
300 0.0951051414013
400 0.128303915262
500 0.102963000536
600 0.0910559669137
700 0.0898950695992
800 0.119949311018
900 0.0603067912161
1000 0.105748720467
1100 0.108760476112
1200 0.0820947736502
1300 0.0971114039421
1400 0.0836166366935
1500 0.0899554267526
1600 0.109069615602
1700 0.0921652168036
1800 0.0759357959032
1900 0.100669950247
We visualize embedded training images as following. You see the images from the same class embedded near each other.
all_image = digits.images[:512, None]
all_label = digits.target[:512]
x_all = nn.Variable(all_image.shape)
x_all.d = all_image
with nn.auto_forward():
embed, _ = cnn_embed(x_all, test=True)
plt.figure(figsize=(16, 9))
for i in range(10):
c = plt.cm.Set1(i / 10.) # Maybe it doesn't work in an older version of Matplotlib where color map lies in [0, 256)
plt.plot(embed.d[all_label == i, 0].flatten(), embed.d[
all_label == i, 1].flatten(), '.', c=c)
plt.legend(map(str, range(10)))
plt.grid()
Appendix¶
A. Logistic Regression¶
Here we demonstrate how to train the simplest neural network, logistic regression (single layer perceptron). Logistic regression is a linear classifier \(f : {\cal R}^{D\times 1} \rightarrow {\cal R}^{K\times 1}\)
where \(\mathbf x \in {\cal R}^{D \times 1}\) is an input image flattened to a vector, \(t \in \{0, 1, \cdots, K\}\) is a target label, \(\mathbf W \in {\cal R}^{K \times D}\) is a weight matrix, \(\mathbf b \in {\cal R}^{K \times 1}\) is a bias vector and \(\mathbf \Theta \equiv \left\{\mathbf W, \mathbf b\right\}\). Loss function is defined as
where \(\mathbf X \equiv \left\{\mathbf x_1, t_1, \cdots, \mathbf x_N, t_N\right\}\) denotes a dataset the network trained on, \(\sigma(\mathbf z)\) is softmax operation defined as \(\frac{\exp(\mathbf z)}{\sum_{z \subset \mathbf z} \exp(z)}\), and \(\left[\mathbf z\right]_i\) denotes ith element of \(\mathbf z\).
NNabla Python API Demonstration Tutorial¶
Let us import nnabla first, and some additional useful tools.
# python2/3 compatibility
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
import nnabla as nn # Abbreviate as nn for convenience.
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
20170927 14:00:30,785 [nnabla][INFO]: Initializing CPU extension...
NdArray¶
NdArray is a data container of a multidimensional array. NdArray is
device (e.g. CPU, CUDA) and type (e.g. uint8, float32) agnostic, in
which both type and device are implicitly casted or transferred when it
is used. Below, you create a NdArray with a shape of (2, 3, 4)
.
a = nn.NdArray((2, 3, 4))
You can see the values held inside a
by the following. The values
are not initialized, and are created as float32 by default.
print(a.data)
[[[ 9.42546995e+24 4.56809286e41 8.47690058e38 0.00000000e+00]
[ 7.38056336e+34 7.50334969e+28 1.17078231e32 7.58387310e+31]
[ 7.87001454e12 9.84394250e12 6.85712044e+22 1.81785692e+31]]
[[ 1.84681296e+25 1.84933247e+20 4.85656319e+33 2.06176836e19]
[ 6.80020530e+22 1.69307638e+22 2.11235872e19 1.94316151e19]
[ 1.81805047e+31 3.01289097e+29 2.07004908e19 1.84648795e+25]]]
The accessor .data
returns a reference to the values of NdArray as
numpy.ndarray
. You can modify these by using the Numpy API as
follows.
print('[Substituting random values]')
a.data = np.random.randn(*a.shape)
print(a.data)
print('[Slicing]')
a.data[0, :, ::2] = 0
print(a.data)
[Substituting random values]
[[[ 0.36133638 0.22121875 1.5912329 0.33490974]
[ 1.35962474 0.2165522 0.54483992 0.61813235]
[0.13718799 0.44104072 0.51307833 0.73900551]]
[[0.59464753 2.17738533 0.28626776 0.45654735]
[ 0.73566747 0.87292582 0.41605178 0.04792296]
[0.63856047 0.31966645 0.63974309 0.61385244]]]
[Slicing]
[[[ 0. 0.22121875 0. 0.33490974]
[ 0. 0.2165522 0. 0.61813235]
[ 0. 0.44104072 0. 0.73900551]]
[[0.59464753 2.17738533 0.28626776 0.45654735]
[ 0.73566747 0.87292582 0.41605178 0.04792296]
[0.63856047 0.31966645 0.63974309 0.61385244]]]
Note that the above operation is all done in the host device (CPU).
NdArray provides more efficient functions in case you want to fill all
values with a constant, .zero
and .fill
. They are lazily
evaluated when the data is requested (when neural network computation
requests the data, or when numpy array is requested by Python) The
filling operation is executed within a specific device (e.g. CUDA GPU),
and more efficient if you specify the device setting, which we explain
later.
a.fill(1) # Filling all values with one.
print(a.data)
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
You can create an NdArray instance directly from a Numpy array object.
b = nn.NdArray.from_numpy_array(np.ones(a.shape))
print(b.data)
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
NdArray is used in Variable class, as well as NNabla’s imperative computation of neural networks. We describe them in the later sections.
Variable¶
Variable class is used when you construct a neural network. The neural network can be described as a graph in which an edge represents a function (a.k.a operator and layer) which defines operation of a minimum unit of computation, and a node represents a variable which holds input/output values of a function (Function class is explained later). The graph is called “Computation Graph”.
In NNabla, a Variable, a node of a computation graph, holds two
NdArray
s, one for storing the input or output values of a function
during forward propagation (executing computation graph in the forward
order), while another for storing the backward error signal (gradient)
during backward propagation (executing computation graph in backward
order to propagate error signals down to parameters (weights) of neural
networks). The first one is called data
, the second is grad
in
NNabla.
The following line creates a Variable instance with a shape of (2, 3,
4). It has data
and grad
as NdArray
. The flag need_grad
is used to omit unnecessary gradient computation during backprop if set
to False.
x = nn.Variable([2, 3, 4], need_grad=True)
print('x.data:', x.data)
print('x.grad:', x.grad)
x.data: <NdArray((2, 3, 4)) at 0x7f575caf4ea0>
x.grad: <NdArray((2, 3, 4)) at 0x7f575caf4ea0>
You can get the shape by:
x.shape
(2, 3, 4)
Since both data
and grad
are NdArray
, you can get a
reference to its values as NdArray with the .data
accessor, but also
it can be referred by .d
or .g
property for data
and grad
respectively.
print('x.data')
print(x.d)
x.d = 1.2345 # To avoid NaN
assert np.all(x.d == x.data.data), 'd: {} != {}'.format(x.d, x.data.data)
print('x.grad')
print(x.g)
x.g = 1.2345 # To avoid NaN
assert np.all(x.g == x.grad.data), 'g: {} != {}'.format(x.g, x.grad.data)
# Zeroing grad values
x.grad.zero()
print('x.grad (after `.zero()`)')
print(x.g)
x.data [[[ 9.42553452e+24 4.56809286e41 8.32543479e38 0.00000000e+00] [ nan nan 0.00000000e+00 0.00000000e+00] [ 3.70977305e+25 4.56809286e41 3.78350585e44 0.00000000e+00]] [[ 5.68736600e38 0.00000000e+00 1.86176378e13 4.56809286e41] [ 4.74367616e+25 4.56809286e41 5.43829710e+19 4.56809286e41] [ 0.00000000e+00 0.00000000e+00 2.93623372e38 0.00000000e+00]]] x.grad [[[ 9.42576510e+24 4.56809286e41 9.42576510e+24 4.56809286e41] [ 9.27127763e38 0.00000000e+00 9.27127763e38 0.00000000e+00] [ 1.69275966e+22 4.80112800e+30 1.21230330e+25 7.22962302e+31]] [[ 1.10471027e32 4.63080422e+27 2.44632805e+20 2.87606258e+20] [ 4.46263300e+30 4.62311881e+30 7.65000750e+28 3.01339003e+29] [ 2.08627352e10 1.03961868e+21 7.99576678e+20 1.74441223e+22]]] x.grad (after .zero()) [[[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]]]
Like NdArray
, a Variable
can also be created from Numpy
array(s).
x2 = nn.Variable.from_numpy_array(np.ones((3,)), need_grad=True)
print(x2)
print(x2.d)
x3 = nn.Variable.from_numpy_array(np.ones((3,)), np.zeros((3,)), need_grad=True)
print(x3)
print(x3.d)
print(x3.g)
<Variable((3,), need_grad=True) at 0x7f572a5242c8>
[ 1. 1. 1.]
<Variable((3,), need_grad=True) at 0x7f572a5244a8>
[ 1. 1. 1.]
[ 0. 0. 0.]
Besides storing values of a computation graph, pointing a parent edge
(function) to trace the computation graph is an important role. Here
x
doesn’t have any connection. Therefore, the .parent
property
returns None.
print(x.parent)
None
Function¶
A function defines a operation block of a computation graph as we
described above. The module nnabla.functions
offers various
functions (e.g. Convolution, Affine and ReLU). You can see the list of
functions available in the API reference
guide.
import nnabla.functions as F
As an example, here you will defines a computation graph that computes the elementwise Sigmoid function outputs for the input variable and sums up all values into a scalar. (This is simple enough to explain how it behaves but a meaningless example in the context of neural network training. We will show you a neural network example later.)
sigmoid_output = F.sigmoid(x)
sum_output = F.reduce_sum(sigmoid_output)
The function API in nnabla.functions
takes one (or several)
Variable(s) and arguments (if any), and returns one (or several) output
Variable(s). The .parent
points to the function instance which
created it. Note that no computation occurs at this time since we just
define the graph. (This is the default behavior of NNabla computation
graph API. You can also fire actual computation during graph definition
which we call “Dynamic mode” (explained later)).
print("sigmoid_output.parent.name:", sigmoid_output.parent.name)
print("x:", x)
print("sigmoid_output.parent.inputs refers to x:", sigmoid_output.parent.inputs)
sigmoid_output.parent.name: Sigmoid
x: <Variable((2, 3, 4), need_grad=True) at 0x7f572a51a778>
sigmoid_output.parent.inputs refers to x: [<Variable((2, 3, 4), need_grad=True) at 0x7f572a273a48>]
print("sum_output.parent.name:", sum_output.parent.name)
print("sigmoid_output:", sigmoid_output)
print("sum_output.parent.inputs refers to sigmoid_output:", sum_output.parent.inputs)
sum_output.parent.name: ReduceSum
sigmoid_output: <Variable((2, 3, 4), need_grad=True) at 0x7f572a524638>
sum_output.parent.inputs refers to sigmoid_output: [<Variable((2, 3, 4), need_grad=True) at 0x7f572a273a48>]
The .forward()
at a leaf Variable executes the forward pass
computation in the computation graph.
sum_output.forward()
print("CG output:", sum_output.d)
print("Reference:", np.sum(1.0 / (1.0 + np.exp(x.d))))
CG output: 18.59052085876465
Reference: 18.5905
The .backward()
does the backward propagation through the graph.
Here we initialize the grad
values as zero before backprop since the
NNabla backprop algorithm always accumulates the gradient in the root
variables.
x.grad.zero()
sum_output.backward()
print("d sum_o / d sigmoid_o:")
print(sigmoid_output.g)
print("d sum_o / d x:")
print(x.g)
d sum_o / d sigmoid_o:
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
d sum_o / d x:
[[[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]]
[[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]]]
NNabla is developed by mainly focused on neural network training and
inference. Neural networks have parameters to be learned associated with
computation blocks such as Convolution, Affine (a.k.a. fully connected,
dense etc.). In NNabla, the learnable parameters are also represented as
Variable
objects. Just like input variables, those parameter
variables are also used by passing into Function
s. For example,
Affine function takes input, weights and biases as inputs.
x = nn.Variable([5, 2]) # Input
w = nn.Variable([2, 3], need_grad=True) # Weights
b = nn.Variable([3], need_grad=True) # Biases
affine_out = F.affine(x, w, b) # Create a graph including only affine
The above example takes an input with B=5 (batchsize) and D=2 (dimensions) and maps it to D’=3 outputs, i.e. (B, D’) output.
You may also notice that here you set need_grad=True
only for
parameter variables (w and b). The x is a nonparameter variable and the
root of computation graph. Therefore, it doesn’t require gradient
computation. In this configuration, the gradient computation for x is
not executed in the first affine, which will omit the computation of
unnecessary backpropagation.
The next block sets data and initializes grad, then applies forward and backward computation.
# Set random input and parameters
x.d = np.random.randn(*x.shape)
w.d = np.random.randn(*w.shape)
b.d = np.random.randn(*b.shape)
# Initialize grad
x.grad.zero() # Just for showing gradients are not computed when need_grad=False (default).
w.grad.zero()
b.grad.zero()
# Forward and backward
affine_out.forward()
affine_out.backward()
# Note: Calling backward at nonscalar Variable propagates 1 as error message from all element of outputs. .
You can see that affine_out holds an output of Affine.
print('F.affine')
print(affine_out.d)
print('Reference')
print(np.dot(x.d, w.d) + b.d)
F.affine
[[0.17701732 2.86095762 0.82298267]
[0.75544345 1.16702223 2.44841242]
[0.36278027 3.4771595 0.75681627]
[ 0.32743117 0.24258983 1.30944324]
[0.87201929 1.94556415 3.23357344]]
Reference
[[0.1770173 2.86095762 0.82298267]
[0.75544345 1.16702223 2.44841242]
[0.3627803 3.4771595 0.75681627]
[ 0.32743117 0.24258983 1.309443 ]
[0.87201929 1.94556415 3.23357344]]
The resulting gradients of weights and biases are as follows.
print("dw")
print(w.g)
print("db")
print(b.g)
dw
[[ 3.10820675 3.10820675 3.10820675]
[ 0.37446201 0.37446201 0.37446201]]
db
[ 5. 5. 5.]
The gradient of x
is not changed because need_grad
is set as
False.
print(x.g)
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
Parametric Function¶
Considering parameters as inputs of Function
enhances expressiveness
and flexibility of computation graphs. However, to define all parameters
for each learnable function is annoying for users to define a neural
network. In NNabla, trainable models are usually created by composing
functions that have optimizable parameters. These functions are called
“Parametric Functions”. The Parametric Function API provides various
parametric functions and an interface for composing trainable models.
To use parametric functions, import:
import nnabla.parametric_functions as PF
The function with optimizable parameter can be created as below.
with nn.parameter_scope("affine1"):
c1 = PF.affine(x, 3)
The first line creates a parameter scope. The second line then
applies PF.affine
 an affine transform  to x
, and creates a
variable c1
holding that result. The parameters are created and
initialized randomly at function call, and registered by a name
“affine1” using parameter_scope
context. The function
nnabla.get_parameters()
allows to get the registered parameters.
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>)])
The name=
argument of any PF function creates the equivalent
parameter space to the above definition of PF.affine
transformation
as below. It could save the space of your Python code. The
nnabla.parametric_scope
is more useful when you group multiple
parametric functions such as ConvolutionBatchNormalization found in a
typical unit of CNNs.
c1 = PF.affine(x, 3, name='affine1')
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>)])
It is worth noting that the shapes of both outputs and parameter variables (as you can see above) are automatically determined by only providing the output size of affine transformation(in the example above the output size is 3). This helps to create a graph in an easy way.
c1.shape
(5, 3)
Parameter scope can be nested as follows (although a meaningless example).
with nn.parameter_scope('foo'):
h = PF.affine(x, 3)
with nn.parameter_scope('bar'):
h = PF.affine(h, 4)
This creates the following.
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>),
('foo/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822fa98>),
('foo/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822fae8>),
('foo/bar/affine/W',
<Variable((3, 4), need_grad=True) at 0x7f572822f728>),
('foo/bar/affine/b',
<Variable((4,), need_grad=True) at 0x7f572822fdb8>)])
Also, get_parameters()
can be used in parameter_scope
. For
example:
with nn.parameter_scope("foo"):
print(nn.get_parameters())
OrderedDict([('affine/W', <Variable((2, 3), need_grad=True) at 0x7f572822fa98>), ('affine/b', <Variable((3,), need_grad=True) at 0x7f572822fae8>), ('bar/affine/W', <Variable((3, 4), need_grad=True) at 0x7f572822f728>), ('bar/affine/b', <Variable((4,), need_grad=True) at 0x7f572822fdb8>)])
nnabla.clear_parameters()
can be used to delete registered
parameters under the scope.
with nn.parameter_scope("foo"):
nn.clear_parameters()
print(nn.get_parameters())
OrderedDict([('affine1/affine/W', <Variable((2, 3), need_grad=True) at 0x7f572822f0e8>), ('affine1/affine/b', <Variable((3,), need_grad=True) at 0x7f572822f138>)])
MLP Example For Explanation¶
The following block creates a computation graph to predict one dimensional output from two dimensional inputs by a 2 layer fully connected neural network (multilayer perceptron).
nn.clear_parameters()
batchsize = 16
x = nn.Variable([batchsize, 2])
with nn.parameter_scope("fc1"):
h = F.tanh(PF.affine(x, 512))
with nn.parameter_scope("fc2"):
y = PF.affine(h, 1)
print("Shapes:", h.shape, y.shape)
Shapes: (16, 512) (16, 1)
This will create the following parameter variables.
nn.get_parameters()
OrderedDict([('fc1/affine/W',
<Variable((2, 512), need_grad=True) at 0x7f572822fef8>),
('fc1/affine/b',
<Variable((512,), need_grad=True) at 0x7f572822f9a8>),
('fc2/affine/W',
<Variable((512, 1), need_grad=True) at 0x7f572822f778>),
('fc2/affine/b',
<Variable((1,), need_grad=True) at 0x7f572822ff98>)])
As described above, you can execute the forward pass by calling forward method at the terminal variable.
x.d = np.random.randn(*x.shape) # Set random input
y.forward()
print(y.d)
[[0.05708594]
[ 0.01661986]
[0.34168088]
[ 0.05822293]
[0.16566885]
[0.04867431]
[ 0.2633169 ]
[ 0.10496549]
[0.01291842]
[0.09726256]
[0.05720493]
[0.09691752]
[0.07822668]
[0.17180404]
[ 0.11970415]
[0.08222144]]
Training a neural networks needs a loss value to be minimized by gradient descent with backprop. In NNabla, loss function is also a just function, and packaged in the functions module.
# Variable for label
label = nn.Variable([batchsize, 1])
# Set loss
loss = F.reduce_mean(F.squared_error(y, label))
# Execute forward pass.
label.d = np.random.randn(*label.shape) # Randomly generate labels
loss.forward()
print(loss.d)
1.9382084608078003
As you’ve seen above, NNabla backward
accumulates the gradients at
the root variables. You have to initialize the grad of the parameter
variables before backprop (We will show you the easiest way with
Solver
API).
# Collect all parameter variables and init grad.
for name, param in nn.get_parameters().items():
param.grad.zero()
# Gradients are accumulated to grad of params.
loss.backward()
Imperative Mode¶
After performing backprop, gradients are held in parameter variable grads. The next block will update the parameters with vanilla gradient descent.
for name, param in nn.get_parameters().items():
param.data = param.grad * 0.001 # 0.001 as learning rate
The above computation is an example of NNabla’s “Imperative Mode” for
executing neural networks. Normally, NNabla functions (instances of
nnabla.functions)
take Variable
s as their input. When at least one NdArray
is
provided as an input for NNabla functions (instead of Variable
s),
the function computation will be fired immediately, and returns an
NdArray
as the output, instead of returning a Variable
. In the
above example, the NNabla functions F.mul_scalar
and F.sub2
are
called by the overridden operators *
and =
, respectively.
In other words, NNabla’s “Imperative mode” doesn’t create a computation graph, and can be used like NumPy. If device acceleration such as CUDA is enabled, it can be used like NumPy empowered with device acceleration. Parametric functions can also be used with NdArray input(s). The following block demonstrates a simple imperative execution example.
# A simple example of imperative mode.
xi = nn.NdArray.from_numpy_array(np.arange(4).reshape(2, 2))
yi = F.relu(xi  1)
print(xi.data)
print(yi.data)
[[0 1]
[2 3]]
[[ 0. 0.]
[ 1. 2.]]
Note that inplace substitution from the rhs to the lhs cannot be done
by the =
operator. For example, when x
is an NdArray
,
writing x = x + 1
will not increment all values of x

instead, the expression on the lhs will create a new NdArray
object that different from the one originally bound by x
, and binds
the new NdArray
object to the Python variable x
on the rhs.
For inplace editing of NdArrays
, the inplace assignment operators
+=
, =
, *=
, and /=
can be used. The copy_from
method
can also be used to copy values of an existing NdArray
to another.
For example, incrementing 1 to x
, an NdArray
, can be done by
x.copy_from(x+1)
. The copy is performed with device acceleration if
a device context is specified by using nnabla.set_default_context
or
nnabla.context_scope
.
# The following doesn't perform substitution but assigns a new NdArray object to `xi`.
# xi = xi + 1
# The following copies the result of `xi + 1` to `xi`.
xi.copy_from(xi + 1)
assert np.all(xi.data == (np.arange(4).reshape(2, 2) + 1))
# Inplace operations like `+=`, `*=` can also be used (more efficient).
xi += 1
assert np.all(xi.data == (np.arange(4).reshape(2, 2) + 2))
Solver¶
NNabla provides stochastic gradient descent algorithms to optimize
parameters listed in the nnabla.solvers
module. The parameter
updates demonstrated above can be replaced with this Solver API, which
is easier and usually faster.
from nnabla import solvers as S
solver = S.Sgd(lr=0.00001)
solver.set_parameters(nn.get_parameters())
# Set random data
x.d = np.random.randn(*x.shape)
label.d = np.random.randn(*label.shape)
# Forward
loss.forward()
Just call the the following solver method to fill zero grad region, then backprop
solver.zero_grad()
loss.backward()
The following block updates parameters with the Vanilla Sgd rule (equivalent to the imperative example above).
solver.update()
Toy Problem To Demonstrate Training¶
The following function defines a regression problem which computes the norm of a vector.
def vector2length(x):
# x : [B, 2] where B is number of samples.
return np.sqrt(np.sum(x ** 2, axis=1, keepdims=True))
We visualize this mapping with the contour plot by matplotlib as follows.
# Data for plotting contour on a grid data.
xs = np.linspace(1, 1, 100)
ys = np.linspace(1, 1, 100)
grid = np.meshgrid(xs, ys)
X = grid[0].flatten()
Y = grid[1].flatten()
def plot_true():
"""Plotting contour of true mapping from a grid data created above."""
plt.contourf(xs, ys, vector2length(np.hstack([X[:, None], Y[:, None]])).reshape(100, 100))
plt.axis('equal')
plt.colorbar()
plot_true()
We define a deep prediction neural network.
def length_mlp(x):
h = x
for i, hnum in enumerate([4, 8, 4, 2]):
h = F.tanh(PF.affine(h, hnum, name="fc{}".format(i)))
y = PF.affine(h, 1, name='fc')
return y
nn.clear_parameters()
batchsize = 100
x = nn.Variable([batchsize, 2])
y = length_mlp(x)
label = nn.Variable([batchsize, 1])
loss = F.reduce_mean(F.squared_error(y, label))
We created a 5 layers deep MLP using forloop. Note that only 3 lines of the code potentially create infinitely deep neural networks. The next block adds helper functions to visualize the learned function.
def predict(inp):
ret = []
for i in range(0, inp.shape[0], x.shape[0]):
xx = inp[i:i + x.shape[0]]
# Imperative execution
xi = nn.NdArray.from_numpy_array(xx)
yi = length_mlp(xi)
ret.append(yi.data.copy())
return np.vstack(ret)
def plot_prediction():
plt.contourf(xs, ys, predict(np.hstack([X[:, None], Y[:, None]])).reshape(100, 100))
plt.colorbar()
plt.axis('equal')
Next we instantiate a solver object as follows. We use Adam optimizer which is one of the most popular SGD algorithm used in the literature.
from nnabla import solvers as S
solver = S.Adam(alpha=0.01)
solver.set_parameters(nn.get_parameters())
The following function generates data from the true system infinitely.
def random_data_provider(n):
x = np.random.uniform(1, 1, size=(n, 2))
y = vector2length(x)
return x, y
In the next block, we run 2000 training steps (SGD updates).
num_iter = 2000
for i in range(num_iter):
# Sample data and set them to input variables of training.
xx, ll = random_data_provider(batchsize)
x.d = xx
label.d = ll
# Forward propagation given inputs.
loss.forward(clear_no_need_grad=True)
# Parameter gradients initialization and gradients computation by backprop.
solver.zero_grad()
loss.backward(clear_buffer=True)
# Apply weight decay and update by Adam rule.
solver.weight_decay(1e6)
solver.update()
# Just print progress.
if i % 100 == 0 or i == num_iter  1:
print("Loss@{:4d}: {}".format(i, loss.d))
Loss@ 0: 0.6976373195648193
Loss@ 100: 0.08075223118066788
Loss@ 200: 0.005213144235312939
Loss@ 300: 0.001955194864422083
Loss@ 400: 0.0011660841992124915
Loss@ 500: 0.0006421314901672304
Loss@ 600: 0.0009330055327154696
Loss@ 700: 0.0008817618945613503
Loss@ 800: 0.0006205961108207703
Loss@ 900: 0.0009072928223758936
Loss@1000: 0.0008160348515957594
Loss@1100: 0.0011569359339773655
Loss@1200: 0.000837412488181144
Loss@1300: 0.0011542742140591145
Loss@1400: 0.0005833200993947685
Loss@1500: 0.0009848927147686481
Loss@1600: 0.0005141657311469316
Loss@1700: 0.0009339841199107468
Loss@1800: 0.000950580753851682
Loss@1900: 0.0005430278833955526
Loss@1999: 0.0007046313839964569
Memory usage optimization: You may notice that, in the above
updates, .forward()
is called with the clear_no_need_grad=
option, and .backward()
is called with the clear_buffer=
option.
Training of neural network in more realistic scenarios usually consumes
huge memory due to the nature of backpropagation algorithm, in which all
of the forward variable buffer data
should be kept in order to
compute the gradient of a function. In a naive implementation, we keep
all the variable data
and grad
living until the NdArray
objects are not referenced (i.e. the graph is deleted). The clear_*
options in .forward()
and .backward()
enables to save memory
consumption due to that by clearing (erasing) memory of data
and
grad
when it is not referenced by any subsequent computation. (More
precisely speaking, it doesn’t free memory actually. We use our memory
pool engine by default to avoid memory alloc/free overhead). The
unreferenced buffers can be reused in subsequent computation. See the
document of Variable
for more details. Note that the following
loss.forward(clear_buffer=True)
clears data
of any intermediate
variables. If you are interested in intermediate variables for some
purposes (e.g. debug, log), you can use the .persistent
flag to
prevent clearing buffer of a specific Variable
like below.
loss.forward(clear_buffer=True)
print("The prediction `y` is cleared because it's an intermediate variable.")
print(y.d.flatten()[:4]) # to save space show only 4 values
y.persistent = True
loss.forward(clear_buffer=True)
print("The prediction `y` is kept by the persistent flag.")
print(y.d.flatten()[:4]) # to save space show only 4 value
The prediction y is cleared because it's an intermediate variable. [ 2.27279830e04 6.02164946e05 5.33679675e04 2.35557582e05] The prediction y is kept by the persistent flag. [ 1.0851264 0.87657517 0.79603785 0.40098712]
We can confirm the prediction performs fairly well by looking at the following visualization of the ground truth and prediction function.
plt.subplot(121)
plt.title("Ground truth")
plot_true()
plt.subplot(122)
plt.title("Prediction")
plot_prediction()
You can save learned parameters by nnabla.save_parameters
and load
by nnabla.load_parameters
.
path_param = "paramvector2length.h5"
nn.save_parameters(path_param)
# Remove all once
nn.clear_parameters()
nn.get_parameters()
20170927 14:00:40,544 [nnabla][INFO]: Parameter save (.h5): paramvector2length.h5
OrderedDict()
# Load again
nn.load_parameters(path_param)
print('\n'.join(map(str, nn.get_parameters().items())))
20170927 14:00:40,564 [nnabla][INFO]: Parameter load (<builtin function format>): paramvector2length.h5
('fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f576328df48>)
('fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57245f2868>)
('fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f576328def8>)
('fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5727ee5c78>)
('fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297318>)
('fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5727d29908>)
('fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f57632973b8>)
('fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f57632974a8>)
('fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f57632974f8>)
('fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297598>)
Both save and load functions can also be used in a parameter scope.
with nn.parameter_scope('foo'):
nn.load_parameters(path_param)
print('\n'.join(map(str, nn.get_parameters().items())))
20170927 14:00:40,714 [nnabla][INFO]: Parameter load (<builtin function format>): paramvector2length.h5
('fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f576328df48>)
('fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57245f2868>)
('fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f576328def8>)
('fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5727ee5c78>)
('fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297318>)
('fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5727d29908>)
('fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f57632973b8>)
('fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f57632974a8>)
('fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f57632974f8>)
('fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297598>)
('foo/fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f5763297958>)
('foo/fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57632978b8>)
('foo/fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f572a51ac78>)
('foo/fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5763297c78>)
('foo/fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297a98>)
('foo/fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5763297d68>)
('foo/fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f5763297e08>)
('foo/fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f5763297ea8>)
('foo/fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f5763297f48>)
('foo/fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297cc8>)
!rm {path_param} # Clean ups
Static vs Dynamic Neural Networks in NNabla¶
NNabla allows you to define static and dynamic neural networks. Static neural networks have a fixed layer architecture, i.e., a static computation graph. In contrast, dynamic neural networks use a dynamic computation graph, e.g., randomly dropping layers for each minibatch.
This tutorial compares both computation graphs.
%matplotlib inline
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
import numpy as np
np.random.seed(0)
GPU = 0 # ID of GPU that we will use
20170626 23:10:05,832 [nnabla][INFO]: Initializing CPU extension...
Dataset loading¶
We will first setup the digits dataset from scikitlearn:
from tiny_digits import *
digits = load_digits()
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
20170626 23:10:06,042 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:06,043 [nnabla][INFO]: Using DataSourceWithMemoryCache
20170626 23:10:06,044 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:06,044 [nnabla][INFO]: Onmemory
20170626 23:10:06,045 [nnabla][INFO]: Using DataIterator
Each sample in this dataset is a grayscale image of size 8x8 and belongs
to one of the ten classes 0
, 1
, …, 9
.
img, label = data.next()
print(img.shape, label.shape)
(16, 1, 8, 8) (16, 1)
Network definition¶
As an example, we define a (unnecessarily) deep CNN:
def cnn(x):
"""Unnecessarily Deep CNN.
Args:
x : Variable, shape (B, 1, 8, 8)
Returns:
y : Variable, shape (B, 10)
"""
with nn.parameter_scope("cnn"): # Parameter scope can be nested
with nn.parameter_scope("conv1"):
h = F.tanh(PF.batch_normalization(
PF.convolution(x, 64, (3, 3), pad=(1, 1))))
for i in range(10): # unnecessarily deep
with nn.parameter_scope("conv{}".format(i + 2)):
h = F.tanh(PF.batch_normalization(
PF.convolution(h, 128, (3, 3), pad=(1, 1))))
with nn.parameter_scope("conv_last"):
h = F.tanh(PF.batch_normalization(
PF.convolution(h, 512, (3, 3), pad=(1, 1))))
h = F.average_pooling(h, (2, 2))
with nn.parameter_scope("fc"):
h = F.tanh(PF.affine(h, 1024))
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y
Static computation graph¶
First, we will look at the case of a static computation graph where the neural network does not change during training.
from nnabla.ext_utils import get_extension_context
# setup cuda extension
ctx_cuda = get_extension_context('cudnn', device_id=GPU) # replace 'cudnn' by 'cpu' if you want to run the example on the CPU
nn.set_default_context(ctx_cuda)
# create variables for network input and label
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)
# create network
static_y = cnn(x)
static_y.persistent = True
# define loss function for training
static_l = F.mean(F.softmax_cross_entropy(static_y, t))
20170626 23:10:06,350 [nnabla][INFO]: Initializing CUDA extension...
20170626 23:10:06,571 [nnabla][INFO]: Initializing cuDNN extension...
Setup solver for training
solver = S.Adam(alpha=1e3)
solver.set_parameters(nn.get_parameters())
Create data iterator
loss = []
def epoch_end_callback(epoch):
global loss
print("[{} {} {}]".format(epoch, np.mean(loss), itr))
loss = []
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
20170626 23:10:07,221 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:07,224 [nnabla][INFO]: Using DataSourceWithMemoryCache
20170626 23:10:07,226 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:07,228 [nnabla][INFO]: Onmemory
20170626 23:10:07,230 [nnabla][INFO]: Using DataIterator
Perform training iterations and output training loss:
%%time
for epoch in range(30):
itr = 0
while data.epoch == epoch:
x.d, t.d = data.next()
static_l.forward(clear_no_need_grad=True)
solver.zero_grad()
static_l.backward(clear_buffer=True)
solver.update()
loss.append(static_l.d.copy())
itr += 1
print()
[ 0 0.909297 112 ] [ 1 0.183863 111 ] [ 2 0.0723054 111 ] [ 3 0.0653021 112 ] [ 4 0.0628503 111 ] [ 5 0.0731626 111 ] [ 6 0.0319093 112 ] [ 7 0.0610926 111 ] [ 8 0.0817437 111 ] [ 9 0.0717577 112 ] [ 10 0.0241882 111 ] [ 11 0.0119452 111 ] [ 12 0.00664761 112 ] [ 13 0.00377711 111 ] [ 14 0.000605656 111 ] [ 15 0.000236613 111 ] [ 16 0.000174549 112 ] [ 17 0.000142428 111 ] [ 18 0.000126015 111 ] [ 19 0.000111144 112 ] [ 20 0.000100751 111 ] [ 21 9.03808e05 111 ] [ 22 8.35904e05 112 ] [ 23 7.73492e05 111 ] [ 24 6.91389e05 111 ] [ 25 6.74929e05 112 ] [ 26 6.08386e05 111 ] [ 27 5.62182e05 111 ] [ 28 5.33428e05 112 ] [ 29 4.94594e05 111 ]
CPU times: user 14.3 s, sys: 6.78 s, total: 21.1 s
Wall time: 21.1 s
Dynamic computation graph¶
Now, we will use a dynamic computation graph, where the neural network
is setup each time we want to do a forward/backward pass through it.
This allows us to, e.g., randomly dropout layers or to have network
architectures that depend on input data. In this example, we will use
for simplicity the same neural network structure and only dynamically
create it. For example, adding a
if np.random.rand() > dropout_probability:
into cnn()
allows to
dropout layers.
First, we setup the solver and the data iterator for the training:
nn.clear_parameters()
solver = S.Adam(alpha=1e3)
solver.set_parameters(nn.get_parameters())
loss = []
def epoch_end_callback(epoch):
global loss
print("[{} {} {}]".format(epoch, np.mean(loss), itr))
loss = []
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
20170626 23:10:28,449 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:28,450 [nnabla][INFO]: Using DataSourceWithMemoryCache
20170626 23:10:28,450 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:28,451 [nnabla][INFO]: Onmemory
20170626 23:10:28,451 [nnabla][INFO]: Using DataIterator
%%time
for epoch in range(30):
itr = 0
while data.epoch == epoch:
x.d, t.d = data.next()
with nn.auto_forward():
dynamic_y = cnn(x)
dynamic_l = F.mean(F.softmax_cross_entropy(dynamic_y, t))
solver.set_parameters(nn.get_parameters(), reset=False, retain_state=True) # this can be done dynamically
solver.zero_grad()
dynamic_l.backward(clear_buffer=True)
solver.update()
loss.append(dynamic_l.d.copy())
itr += 1
print()
[ 0 1.04669 112 ] [ 1 0.151949 111 ] [ 2 0.093581 111 ] [ 3 0.129242 112 ] [ 4 0.0452591 111 ] [ 5 0.0343987 111 ] [ 6 0.0315372 112 ] [ 7 0.0336886 111 ] [ 8 0.0194571 111 ] [ 9 0.00923094 112 ] [ 10 0.00536065 111 ] [ 11 0.000669383 111 ] [ 12 0.000294232 112 ] [ 13 0.000245866 111 ] [ 14 0.000201116 111 ] [ 15 0.000164177 111 ] [ 16 0.00014832 112 ] [ 17 0.000131479 111 ] [ 18 0.000115171 111 ] [ 19 0.000101432 112 ] [ 20 9.06228e05 111 ] [ 21 8.7103e05 111 ] [ 22 7.79601e05 112 ] [ 23 7.59678e05 111 ] [ 24 6.64341e05 111 ] [ 25 6.22717e05 112 ] [ 26 5.8643e05 111 ] [ 27 5.35373e05 111 ] [ 28 4.96717e05 112 ] [ 29 4.65124e05 111 ]
CPU times: user 23.4 s, sys: 5.35 s, total: 28.7 s
Wall time: 28.7 s
Comparing the two processing times, we can observe that both schemes (“static” and “dynamic”) takes the same execution time, i.e., although we created the computation graph dynamically, we did not lose performance.
Mixed Precision Training¶
Introduction¶
Traditionally, for training a neural network, we used to use FP32
for weights and activations; however computation costs for training a
neural network rapidly increase over years as the success of deep
learning and the growing size of a neural network. It indicates that we
need to spend much more time for training a huge size of a neural
network while we would like to do lots of trials before a product
launch. To address this problem, companies (e.g., NVIDIA) introduced an
accelerator for speeding up computation. For example, NVIDIA Volta has
Tensor
Cores
to speed up computation.
However, it uses FP16
weights, activations, gradients, and the range
of FP16
is very limited when compared to that of FP32
, meaning
that sometimes (or often) values of gradients overflow and/or underflow,
which affects the performance of a neural network or makes it collapse
during training.
Mixed precision training is one of the algorithms to circumvent that
problem while maintaining the same results that we could obtain with
FP32
networks. It is welldescribed in The Training with Mixed
Precision User
Guide
and Mixed Precision Training.
This tutorial explains how to do the mixed precision training in NNabla stepbystep.
StepbyStep Instruction¶
Basically, the mixed precision training are composed of three parts.
 Use the accelerator for computation (here we assume Tensor Cores)
 Use loss scaling to prevent underflow
 Use dynamic loss calling to prevent overflow/underflow
In NNabla, we can do the correspondences as follows.
1. Use Tensor Cores¶
ctx = get_extension_context("cudnn", type_config="half")
2. Use loss scaling to prevent underflow¶
loss_scale = 8
loss.backward(loss_scale)
solver.scale_grad(1. / loss_scale) # do some gradient clipping, etc. after this
solver.update()
3. Use dynamic loss scaling to prevent overflow/underflow¶
loss_scale = 8
scaling_factor = 2
counter = 0
interval = 2000
...
loss.backward(loss_scale, ...)
...
if solver.check_inf_or_nan_grad():
loss_scale /= scaling_factor
counter = 0
else:
solver.scale_grad(1. / loss_scale) # do some gradient clipping, etc. after this
solver.update()
if counter > interval:
loss_scale *= scaling_factor
counter = 0
counter += 1
Note that currently the procedures of 2nd (Use loss scaling to prevent underflow) and 3rd (Use loss scaling to prevent overflow) are experimental, and we are now trying to speed up the mixed precision training, so API might change for future use, especially 3rd.
Allinone Instruction¶
In the previous stepbystep example, the 3rd step is lengthy in a training loop, thus we can write a wrapper class like the following.
class DynamicLossScalingUpdater(object):
'''Dynamic Loss Scaling Updater for the mixed precision training.
Args:
solver (:obj:`nnabla.solvers.Solver`): Solver object. E.g., Momentum or Adam.
loss (:obj:`nnabla.Variable`): Loss variable from which the forward and the backward is called.
data_feeder (callable :obj:`object`, function, or lambda): Data feeder
scale (:obj:`float`): Loss scale constant. This is dynamically changing during training.
scaling_factor (:obj:`float`): Scaling factor for the dynamic loss scaling.
N (:obj:`int`): Interval, the number of iterations in training for increasing `loss scale` by `scaling_factor`.
clear_buffer (:obj:`bool`): Clears the no longer referenced variables during backpropagation to save memory.
accum_grad (:obj:`int`): Number of accumulation of gradients. Update method of the `solver` is called after the `accum_grad` number of the forward and backward is called.
weight_decay (:obj:`float`): Decay constant. Default is `None`, not applying the weight decay.
comm (:obj:`nnabla.communicators.Communicator`): Communicator when to do distributed training. Default is :obj:`None`.
grads (:obj:`list` of :obj:`nnabla._nd_array.NdArray`): The list of gradients to be exchanged when to do distributed training. Default is the empty :obj:`list`.
Attributes:
solver (:obj:`nnabla.solvers.Solver`): Solver object. E.g., Momentum or Adam.
loss (:obj:`nnabla.Variable`): Loss variable from which the forward and the backward is called.
data_feeder (callable :obj:`object`, function, lambda): Data feeder
scale (:obj:`float`): Loss scale constant. This is dynamically changing during training.
scaling_factor (:obj:`float`): Scaling factor for the dynamic loss scaling.
N (:obj:`int`): Interval, the number of iterations in training for increasing `loss scale` by `scaling_factor`.
clear_buffer (:obj:`bool`): Clears the no longer referenced variables during backpropagation to save memory.
accum_grad (:obj:`int`): Number of accumulation of gradients. Update method of the `solver` is called after the `accum_grad` number of the forward and backward is called.
weight_decay (:obj:`float`): Decay constant. Default is `None`, not applying the weight decay.
comm (:obj:`nnabla.communicators.Communicator`): Communicator when to do distributed training.
grads (:obj:`list` of :obj:`nnabla._nd_array.NdArray`): The list of gradients to be exchanged when to do distributed training.
Example:
.. codeblock:: python
solver = <Solver>
loss = <Loss Variable of Network>
data_feeder = <DataFeeder>
updater = DynamicLossScalingUpdater(solver, loss, data_feeder)
# Training iteration
for itr in range(max_iter):
# Call solver.zero_grad, data_feeder, loss.forward, loss.backward
# and solver.update with the dynamic loss scaling.
updater.update()
Reference:
https://docs.nvidia.com/deeplearning/sdk/mixedprecisiontraining/index.html#scalefactor
'''
def __init__(self, solver, loss, data_feeder=lambda x: x,
scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True,
accum_grad=1, weight_decay=None,
comm=None,
grads=[]):
self.solver = solver
self.loss = loss
self.data_feeder = data_feeder
self.scale = scale
self.scaling_factor = scaling_factor
self.N = N
self.clear_buffer = clear_buffer
self.accum_grad = accum_grad
self.weight_decay = weight_decay
self.comm = comm
self.grads = grads
self._counter = 0
self._recursive_count = 0
self._max_recursive_count = 100
def update(self):
"""Monolithic update method.
This method calls the following methods with the dynamic loss scaling.
1. solver.zerograd
2. feed data
3. loss.forward
4. loss.backward
5. comm.all_reduce (if it is specified)
6. solver.update
"""
# Initialize gradients.
self.solver.zero_grad()
# Forward and backward
for _ in range(self.accum_grad):
# feed data
self.data_feeder()
# forward
self.loss.forward(clear_no_need_grad=self.clear_buffer)
# backward with scale
self.loss.backward(self.scale, clear_buffer=self.clear_buffer)
# AllReduce
if self.comm and len(self.grads) != 0:
self.comm.all_reduce(self.grads, division=False, inplace=False)
# Check Inf/NaN in grads
if self.solver.check_inf_or_nan_grad():
self.scale /= self.scaling_factor
self._counter = 0
# Recursively call udpate function until no inf nor nan.
self._recursive_count += 1
if self._recursive_count > self._max_recursive_count:
self._recursive_count = 0
return # skip
return self.update()
self._recursive_count = 0
# Rescale grads
self.solver.scale_grad(1. / self.scale)
# Do some gradient clipping, etc.
if self.weight_decay is not None:
self.solver.weight_decay(self.weight_decay)
# Update
self.solver.update()
if self._counter > self.N:
self.scale *= self.scaling_factor
self._counter = 0
self._counter += 1
Then, call the update method in a training loop:
from nnabla.experimental.mixed_precision_training import DynamicLossScalingUpdater
solver = <Solver>
loss = <Loss Variable of Network>
data_feeder = <DataFeeder>
updater = DynamicLossScalingUpdater(solver, loss, data_feeder)
# Training iteration
for itr in range(max_iter):
# Call solver.zero_grad, data_feeder, loss.forward, loss.backward
# and solver.update with the dynamic loss scaling.
updater.update()
Notice¶
In the mixedprecision training, the followings are premise:
 Solver contains
FP16
weights and theFP32
copy of weights. Solvers in NNabla holdFP32
weights and weight gradients and cast it toFP16
weights in forward pass and toFP16
weight gradients in backward pass if one setstype_config="half"
.  Reductions should be left in
FP32
, for examples, the statistics (mean and variance) computed by the batchnormalization, Mean, Sum, SoftMax, SoftMaxCrossEntropy, etc. (see The Training with Mixed Precision User Guide). In NNabla, these functions are automatically fallbacked to useFP32
.
Data Parallel Distributed Training¶
DataParallelCommunicator enables to train your neural network using multiple devices. It is normally used for gradients exchange in data parallel distributed training. Basically, there are two types of distributed trainings in Neural Network literature: Data Parallel and Model Parallel. Here we only focus on the former, Data Parallel Training. Data Parallel Distributed Training is based on the very simple equation used for the optimization of a neural network called (MiniBatch) Stochastic Gradient Descent.
In the optimization process, the objective one tries to minimize is
where \(f\) is a neural network, \(B \times N\) is the batch size, \(\ell\) is a loss function for each data point \(\mathbf{x} \in X\), and \(\mathbf{w}\) is the trainable parameter of the neural network.
When taking the derivative of this objective, one gets,
Since the derivative has linearity, one can change the objective to the sum of summations each of which is the sum of derivatives over \(B\) data points.
In data parallel distributed training, the following steps are performed according to the above equation,
 each term, summation of derivatives (gradients) divided by batch size \(B\), is computed on a separated device (typically GPU),
 take the sum over devices,
 divide the result by the number of devices, \(N\).
That is the underlying foundation of Data Parallel Distributed Training.
This tutorial shows the usage of Multi Process Data Parallel Communicator for data parallel distributed training with a very simple example.
NOTE¶
This tutorial depends on IPython Cluster, thus when you want to run the following excerpts of the scripts on Jupyter Notebook, follow this to enable mpiexec/mpirun mode, then launch a corresponding Ipython Cluster on Ipython Clusters tab.
Launch client¶
This code is only needed for this tutorial via Jupyter Notebook.
import ipyparallel as ipp
rc = ipp.Client(profile='mpi')
Prepare the dependencies¶
%%px
import os
import time
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context
import nnabla.functions as F
from nnabla.initializer import (
calc_uniform_lim_glorot,
UniformInitializer)
import nnabla.parametric_functions as PF
import nnabla.solvers as S
import numpy as np
Define the communicator for gradients exchange.¶
%%px
extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()
n_devices = comm.size
mpi_rank = comm.rank
device_id = mpi_rank
ctx = get_extension_context(extension_module, device_id=device_id)
Check different ranks are assigned to different devices
%%px
print("n_devices={}".format(n_devices))
print("mpi_rank={}".format(mpi_rank))
[stdout:0]
n_devices=2
mpi_rank=1
[stdout:1]
n_devices=2
mpi_rank=0
Create data points and a very simple neural network¶
%%px
# Data points setting
n_class = 2
b, c, h, w = 4, 1, 32, 32
# Data points
x_data = np.random.rand(b, c, h, w)
y_data = np.random.choice(n_class, b).reshape((b, 1))
x = nn.Variable(x_data.shape)
y = nn.Variable(y_data.shape)
x.d = x_data
y.d = y_data
# Network setting
C = 1
kernel = (3, 3)
pad = (1, 1)
stride = (1, 1)
%%px
rng = np.random.RandomState(0)
w_init = UniformInitializer(
calc_uniform_lim_glorot(C, C/2, kernel=(1, 1)),
rng=rng)
%%px
# Network
with nn.context_scope(ctx):
h = PF.convolution(x, C, kernel, pad, stride, w_init=w_init)
pred = PF.affine(h, n_class, w_init=w_init)
loss = F.mean(F.softmax_cross_entropy(pred, y))
Important notice here is that w_init
is passed to parametric
functions to let the network on each GPU start from the same values of
trainable parameters in the optimization process.
Create a solver.¶
%%px
# Solver and add parameters
solver = S.Adam()
solver.set_parameters(nn.get_parameters())
Training¶
Recall the basic usage of nnabla
API for training a neural network,
it is
 loss.forward()
 solver.zero_grad()
 loss.backward()
 solver.update()
In use of C.MultiProcessCommunicator
, these steps are
performed in different GPUs, and the only difference from these
steps is comm.all_reduce()
. Thus, in case of
C.MultiProcessCommunicator
training steps are as
follows,
 loss.forward()
 solver.zero_grad()
 loss.backward()
 comm.all_reduce([x.grad for x in nn.get_parameters().values()])
 solver.update()
First, forward, zero_grad, and backward,
%%px
# Training steps
loss.forward()
solver.zero_grad()
loss.backward()
Check gradients of weights once,
%%px
for n, v in nn.get_parameters().items():
print(n, v.g)
[stdout:0]
('conv/W', array([[[[ 5.0180483, 0.457942 , 2.8701296],
[ 2.0715926, 3.0698593, 1.6650047],
[2.5591214, 6.4248834, 9.881935 ]]]], dtype=float32))
('conv/b', array([8.658947], dtype=float32))
('affine/W', array([[0.93160367, 0.9316036 ],
[1.376812 , 1.376812 ],
[1.8957546 , 1.8957543 ],
...,
[0.33000934, 0.33000934],
[0.7211893 , 0.72118926],
[0.25237036, 0.25237036]], dtype=float32))
('affine/b', array([0.48865744, 0.48865741], dtype=float32))
[stdout:1]
('conv/W', array([[[[ 1.2505884 , 0.87151337, 8.685524 ],
[ 10.738419 , 14.676786 , 7.483423 ],
[ 5.612471 , 12.880402 , 19.141157 ]]]], dtype=float32))
('conv/b', array([13.196114], dtype=float32))
('affine/W', array([[1.6865108 , 1.6865108 ],
[0.938529 , 0.938529 ],
[1.028422 , 1.028422 ],
...,
[0.98217344, 0.98217344],
[0.97528917, 0.97528917],
[0.413546 , 0.413546 ]], dtype=float32))
('affine/b', array([0.7447065, 0.7447065], dtype=float32))
You can see the different values on each device, then call
all_reduce
,
%%px
comm.all_reduce([x.grad for x in nn.get_parameters().values()], division=True)
Commonly, all_reduce
only means the sum; however,
comm.all_reduce
addresses both cases: summation and summation
division.
Again, check gradients of weights,
%%px
for n, v in nn.get_parameters().items():
print(n, v.g)
[stdout:0]
('conv/W', array([[[[ 1.8837299 , 0.20678568, 5.777827 ],
[ 6.4050055 , 8.8733225 , 2.9092093 ],
[ 1.5266749 , 3.2277591 , 14.511546 ]]]], dtype=float32))
('conv/b', array([21.85506], dtype=float32))
('affine/W', array([[2.6181145, 2.6181145],
[2.315341 , 2.315341 ],
[2.9241767, 2.9241762],
...,
[1.3121828, 1.3121828],
[1.6964785, 1.6964784],
[0.6659163, 0.6659163]], dtype=float32))
('affine/b', array([1.233364 , 1.2333639], dtype=float32))
[stdout:1]
('conv/W', array([[[[ 1.8837299 , 0.20678568, 5.777827 ],
[ 6.4050055 , 8.8733225 , 2.9092093 ],
[ 1.5266749 , 3.2277591 , 14.511546 ]]]], dtype=float32))
('conv/b', array([21.85506], dtype=float32))
('affine/W', array([[2.6181145, 2.6181145],
[2.315341 , 2.315341 ],
[2.9241767, 2.9241762],
...,
[1.3121828, 1.3121828],
[1.6964785, 1.6964784],
[0.6659163, 0.6659163]], dtype=float32))
('affine/b', array([1.233364 , 1.2333639], dtype=float32))
You can see the same values over the devices because of all_reduce
.
Update weights,
%%px
solver.update()
This concludes the usage of C.MultiProcessDataCommunicator
for Data Parallel Distributed Training.
Now you should have an understanding of how to use
C.MultiProcessCommunicator
, go to the cifar10 example,
 multi_device_multi_process_classification.sh
 multi_device_multi_process_classification.py
for more details.
Debugging¶
Deep neural networks are going deeper and deeper every year, requiring more components in the networks. Such complexity often misleads us to malconfigure the networks that can turn out be critical. Even if we correctly configure a neural network as desired, we may still want to find out its performance bottleneck, e.g., from which layer(s) the computational bottleneck comes from.
In this debugging tutorial, we introduce three techniques to deal with such cases:
visit
method of a variable simple graph viewer
 profiling utils
We will go over each technique, but first prepare the following reference model.
import numpy as np
import nnabla as nn
import nnabla.logger as logger
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
def block(x, maps, test=False, name="block"):
h = x
with nn.parameter_scope(name):
with nn.parameter_scope("inblock1"):
h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
h = PF.batch_normalization(h, batch_stat=not test)
h = F.relu(h)
with nn.parameter_scope("inblock2"):
h = PF.convolution(h, maps // 2, kernel=(3, 3), pad=(1, 1), with_bias=False)
h = PF.batch_normalization(h, batch_stat=not test)
h = F.relu(h)
with nn.parameter_scope("inblock3"):
h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
h = PF.batch_normalization(h, batch_stat=not test)
if h.shape[1] != x.shape[1]:
with nn.parameter_scope("skip"):
s = PF.convolution(x, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
s = PF.batch_normalization(s, batch_stat=not test)
return F.relu(h + s)
def network(x, maps=16, test=False):
h = x
h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="firstconv", with_bias=False)
h = PF.batch_normalization(h, batch_stat=not test, name="firstbn")
h = F.relu(h)
for l in range(4):
h = block(h, maps * 2 ** (l + 1), name="block{}".format(l))
h = F.max_pooling(h, (2, 2))
h = F.average_pooling(h, h.shape[2:])
pred = PF.affine(h, 100, name="pred")
return pred
Visit Method¶
Visit method of a variable takes either lambda, function, callable object as an argument and calls it over all NNabla functions where the variable can traverse in the forward order. It is easier to see the usage than expalined.
First of all, define the callable class.
class PrintFunc(object):
def __call__(self, nnabla_func):
print("==========")
print(nnabla_func.info.type_name)
print(nnabla_func.inputs)
print(nnabla_func.outputs)
print(nnabla_func.info.args)
This callable object takes a NNabla function, e.g., convolution, relu, etc., so a user can get information of that function.
nn.clear_parameters() # this call is just in case to do the following code again
x = nn.Variable([4, 3, 128, 128])
pred = network(x)
pred.visit(PrintFunc())
Simple Graph Viewer¶
Visit method is very useful for getting information about each function used in a graph, but it is hard to see the details of the whole network structure, e.g., which variable is connected to which variable. So we have a graph viewer that visually shows the whole structure of network, enabling us to debug more efficiently. Using this graph viewer is straightforward, as shown in the following code:
# Create graph again just in case
nn.clear_parameters() # call this in case you want to run the following code agian
x = nn.Variable([4, 3, 128, 128])
pred = network(x)
import nnabla.experimental.viewers as V
graph = V.SimpleGraph(verbose=False)
graph.view(pred)
If one would like to see more detailed information as in visit
method case, change verbose option to True
.
graph = V.SimpleGraph(verbose=True)
graph.view(pred)
Now one can see detailed information!
Note that this viewer is mainly for NNabla users who want to write codes in python, so for those who like to see more beautiful network and play with that, please use Neural Network Console and visit https://dl.sony.com/.
Profiling utils¶
Basically, this feature is for developers who want to know the whole stats in speed and which functions could be bottlenecks. NNabla provides a simple profiling tool. Once a network is prepared, one better to have other components to train the network like a loss function and solvers.
First, to create the profile and see the results, run the following codes.
# Create graph again just in case
nn.clear_parameters() # call this in case you want to run the following code agian
# Context
from nnabla.ext_utils import get_extension_context
device = "cudnn"
ctx = get_extension_context(device)
nn.set_default_context(ctx)
# Network
x = nn.Variable([4, 3, 128, 128])
t = nn.Variable([4, 1])
pred = network(x)
loss = F.mean(F.softmax_cross_entropy(pred, t))
# Solver
solver = S.Momentum()
solver.set_parameters(nn.get_parameters())
# Profiler
from nnabla.utils.profiler import GraphProfiler
B = GraphProfiler(loss, solver=solver, device_id=0, ext_name=device, n_run=100)
B.run()
print("Profile finished.")
# Report
from nnabla.utils.profiler import GraphProfilerCsvWriter
with open("./profile.csv", "w") as f:
writer = GraphProfilerCsvWriter(B, file=f)
writer.write()
print("Report is prepared.")
Graph Converter for Inference¶
In this tutorial, we demonstrate several graph converters mainly used for inference. Graph converters are basically used for a trained graph, neural network, so once you train a neural network, you can use graph converters.
We show how to use the following graph converters stepbystep according to usecases.
 BatchNormalizationLinearConverter
 BatchNormalizationFoldedConverter
 FixedPointWeightConverter
 FixedPointActivationConverter
Note before starting the following instruction, import python modules needed.
# Import
import numpy as np
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.experimental.viewers as V
import nnabla.experimental.graph_converters as GC
Also, define LeNet as the motif.
# LeNet
def LeNet(image, test=False):
h = PF.convolution(image, 16, (5, 5), (1, 1), with_bias=False, name='conv1')
h = PF.batch_normalization(h, batch_stat=not test, name='conv1bn')
h = F.max_pooling(h, (2, 2))
h = F.relu(h)
h = PF.convolution(h, 16, (5, 5), (1, 1), with_bias=True, name='conv2')
h = PF.batch_normalization(h, batch_stat=not test, name='conv2bn')
h = F.max_pooling(h, (2, 2))
h = F.relu(h)
h = PF.affine(h, 10, with_bias=False, name='fc1')
h = PF.batch_normalization(h, batch_stat=not test, name='fc1bn')
h = F.relu(h)
pred = PF.affine(h, 10, with_bias=True, name='fc2')
return pred
BatchNormalizationLinearConverter¶
Typical networks contain the batch normalization layers. It serves as normalization in a network and uses the batch stats (the batch mean and variance) to normalize inputs as
in training. \(\mu\) and \(\sigma^2\) are the batch mean and variance, and \(\gamma\) and \(\beta\) are the scale and bias parameter to be learnt.
At the same time, it computes the running stats (the exponential moving average \(\mu_r\) and variance \(\sigma_r^2\) of inputs to the batch normalization layer), which are used later for inference.
If nothing changes, in inference time, the batch normalization is performed as in the above equation using the running stats.
This is the explicit normalization, so as you can see, there are many redundant computations (subtraction, devision, pow2, sqrt, multiplication, addition) in inference, which should be avoided in inference graph. We can do it by ourselves, but it is apparently troublesome.
BatchNormalizationLinearConverter automatically converts this equation of the batch normalization to the simple linear form as
After the conversion, we just have one multiplication and one addition since \(c_0\) and \(c_1\) can be precomputed in inference.
Specifically, suppose that \(x\) is the output of the 2DConvolution, so \(x\) is 3DTensor (e.g., \(N \times H \times W\)). In the batch normalization, the number of \(c\)s is the map size \(N\), respectively for \(c_0\) and \(c_1\). Thus, the multiplication (\(c_0 \times x\)) is \(N \times H \times W\) and the addition ($ + c_1$) is same \(N \times H \times W\). We can see much reduction compared to the native implementation.
Example¶
First, create LeNet.
x = nn.Variable.from_numpy_array(np.random.rand(4, 3, 28, 28))
y = LeNet(x, test=True)
Now look at LeNet visually.
viewer = V.SimpleGraph()
viewer.view(y)
Convert it to the one with the batch normalization linearly folded.
converter = GC.BatchNormalizationLinearConverter(name="bnlinearlenet")
y = converter.convert(y, [x])
Also, show the converted graph.
viewer = V.SimpleGraph()
viewer.view(y)
BatchNormalizationFoldedConverter¶
As you can see in the previous converter, BatchNormalizationLinearConverter is the linear folding of the batch normalization layer in inference. However, if the preceding layer of the batch normalization is the convolution, affine or another layer performing innerproduct, that the linear folding is further folded into the weights of the preceding layers.
Suppose the sequence of a convolution and a batch normalization in inference, it can be written as,
where \(\ast\) is the convolutional operator, \(w\) is the convolutional weights, and \(b\) is the bias of the convolution layer. Since \(\ast\) has linearity, we can further fold \(c_0\) into the weights \(w\) and bias \(b\), such that we have the simpler form.
BatchNormalizationFoldedConverter automatically finds a sequence of the convolution and the batch normalization in a given graph, then folds all parameters related to the batch normalization into the preceding convolution layer. Now, we do not need the multiplication and addition seen in the previous case, BatchNormalizationLinearConverter.
Example¶
First, create LeNet.
x = nn.Variable.from_numpy_array(np.random.rand(4, 3, 28, 28))
y = LeNet(x, test=True)
Now look at LeNet visually.
viewer = V.SimpleGraph()
viewer.view(y)
Convert it to the one with the batch normalization linearly folded.
converter = GC.BatchNormalizationFoldedConverter(name="bnfoldedlenet")
y = converter.convert(y, [x])
Also, show the converted graph.
viewer = V.SimpleGraph()
viewer.view(y)
FixedPointWeightConverter¶
Once training finishes, where to deploy? Your destination of deployment of a trained model might be on Cloud or an embedded device. In either case, the typical data type, FloatingPoint32 (FP32) might be redundant for inference, so you may want to use SIMD operation with e.g., 4bit or 8bit of your target device. Training is usually performed using FP32, while interfence might be performed FixedPoint. Hence, you have to change corresponding layers, e.g., the convolution and affine.
FixedPointWeightConverter automatically converts the affine, convolution, and deconvolution of a given graph to that of fixed point version.
Example¶
First, create LeNet.
x = nn.Variable.from_numpy_array(np.random.rand(4, 3, 28, 28))
y = LeNet(x, test=True)
Now look at LeNet visually.
viewer = V.SimpleGraph()
viewer.view(y)
Convert it to the one with the batch normalization linearly folded.
converter = GC.FixedPointWeightConverter(name="fixedpointweightlenet")
y = converter.convert(y, [x])
Also, show the converted graph.
viewer = V.SimpleGraph()
viewer.view(y)
FixedPointActivationConverter¶
FixedPointWeightConverter converts layers of weights, but
FixedPointActivationConverter automatically converts activation layers,
e.g., ReLU. The typial neural network architecture contains the sequence
of the block ReLU > Convolution > BatchNormalization
; therefore,
when you convert both ReLU
and Convolution
to the fixedpoint
ones with proper hyperparemters (stepsize and bitwidth), you can
utilize your SIMD operation of your target device because both of the
weights and inputs of the convolution are fixedpoint.
Example¶
First, create LeNet.
x = nn.Variable.from_numpy_array(np.random.rand(4, 3, 28, 28))
y = LeNet(x, test=True)
Now look at LeNet visually.
viewer = V.SimpleGraph()
viewer.view(y)
Convert it to the one with the batch normalization linearly folded.
converter = GC.FixedPointActivationConverter(name="fixedpointactivationlenet")
y = converter.convert(y, [x])
Also, show the converted graph.
viewer = V.SimpleGraph()
viewer.view(y)
Tipically, FixedPointWeightConverter and FixedPointActivationConverter
are used togather. For such purposes, you can use
GC.SequentialConverter
.
converter_w = GC.FixedPointWeightConverter(name="fixedpointlenet")
converter_a = GC.FixedPointActivationConverter(name="fixedpointlenet")
converter = GC.SequentialConverter([converter_w, converter_a])
y = converter.convert(y, [x])
Needless to say, GC.SequentialConverter
is not limited to using this
case. One you creat your own Conveterter
s, then you can add these
converters to GC.SequentialConverter
if these are used togather.
Look at the converted graph visually.
viewer = V.SimpleGraph()
viewer.view(y)
Python Command Line Interface¶
Nnabla has command line interface utility which can do train, forward(inference), convert param and dataset, measure performance, file format converter and so on.
usage: nnabla_cli [h] [m]
{train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,upload,create_tar,function_info,dump,nnb_template,convert,plot_series,plot_timer,draw_graph,version}
...
Command line interface for NNabla(Version 1.0.11.dev1, Build 181226024531)
positional arguments:
{train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,upload,create_tar,function_info,dump,nnb_template,convert,plot_series,plot_timer,draw_graph,version}
train Training with NNP.
infer Do inference with NNP and binary data file input.
forward Do evaluation with NNP and test dataset.
encode_param Encode plain text to parameter format.
decode_param Decode parameter to plain text.
profile Profiling performance with NNP.
conv_dataset Convert CSV dataset to cache.
compare_with_cpu Compare performance between two nntxt.
create_image_classification_dataset
Create dataset from image files.
upload Upload dataset to Neural Network Console.
create_tar Create tar file for Neural Network Console.
function_info Output function info.
dump Dump network with supported format.
nnb_template Generate NNB config file template.
convert File format converter.
plot_series Plot *.series.txt files.
plot_timer Plot *.timer.txt files.
draw_graph Draw a graph in a NNP or nntxt file with graphviz.
version Print version and build number.
optional arguments:
h, help show this help message and exit
m, mpi exec with mpi.
Work with NNP¶
Training¶
usage: nnabla_cli train [h] c CONFIG [p PARAM] o OUTDIR
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
p PARAM, param PARAM
path to parameter file
o OUTDIR, outdir OUTDIR
output directory
Profile¶
usage: nnabla_cli profile [h] c CONFIG o OUTDIR
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
o OUTDIR, outdir OUTDIR
output directory
Forward¶
usage: nnabla_cli forward [h] c CONFIG [p PARAM] [d DATASET] o OUTDIR [b BATCH_SIZE]
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
p PARAM, param PARAM
path to parameter file
d DATASET, dataset DATASET
path to CSV dataset
o OUTDIR, outdir OUTDIR
output directory
b BATCH_SIZE, batch_size BATCH_SIZE
Batch size to use batch size in nnp file set 1.
Inference¶
usage: nnabla_cli infer [h] c CONFIG [o OUTPUT] [p PARAM] [b BATCH_SIZE] inputs [inputs ...]
positional arguments:
inputs
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
o OUTPUT, output OUTPUT
output file prefix
p PARAM, param PARAM
path to parameter file
b BATCH_SIZE, batch_size BATCH_SIZE
Batch size to use batch size in nnp file set 1.
Compare with CPU¶
usage: nnabla_cli compare_with_cpu [h] c CONFIG c2 CONFIG2 o OUTDIR
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
c2 CONFIG2, config2 CONFIG2
path to cpu nntxt
o OUTDIR, outdir OUTDIR
output directory
Dataset manipulation¶
Encode parameter¶
usage: nnabla_cli encode_param [h] i INDIR [p PARAM]
optional arguments:
h, help show this help message and exit
i INDIR, indir INDIR
input directory
p PARAM, param PARAM
path to parameter file
Decode parameter¶
usage: nnabla_cli decode_param [h] [p PARAM] o OUTDIR
optional arguments:
h, help show this help message and exit
p PARAM, param PARAM
path to parameter file
o OUTDIR, outdir OUTDIR
output directory
Convert dataset¶
usage: nnabla_cli conv_dataset [h] [F] [S] [N] source destination
positional arguments:
source
destination
optional arguments:
h, help show this help message and exit
F, force force overwrite destination
S, shuffle shuffle data
N, normalize normalize data range
Create image classification dataset¶
usage: nnabla_cli create_image_classification_dataset [h] i SOURCEDIR o OUTDIR c CHANNEL w WIDTH g HEIGHT m MODE s SHUFFLE f1 FILE1 [r1 RATIO1] [f2 FILE2]
[r2 RATIO2]
optional arguments:
h, help show this help message and exit
i SOURCEDIR, sourcedir SOURCEDIR
source directory with directories for each class
o OUTDIR, outdir OUTDIR
output directory
c CHANNEL, channel CHANNEL
number of output color channels
w WIDTH, width WIDTH
width of output image
g HEIGHT, height HEIGHT
height of output image
m MODE, mode MODE shaping mode (trimming or padding)
s SHUFFLE, shuffle SHUFFLE
shuffle mode (true or false)
f1 FILE1, file1 FILE1
output file name 1
r1 RATIO1, ratio1 RATIO1
output file ratio(%) 1
f2 FILE2, file2 FILE2
output file name 2
r2 RATIO2, ratio2 RATIO2
output file ratio(%) 2
Upload dataset to Neural Network Console¶
usage: nnabla_cli upload [h] [e ENDPOINT] token filename
positional arguments:
token token for upload
filename filename to upload
optional arguments:
h, help show this help message and exit
e ENDPOINT, endpoint ENDPOINT
set endpoint uri
Create dataset archive for Neural Network Console¶
usage: nnabla_cli create_tar [h] source destination
positional arguments:
source CSV dataset
destination TAR filename
optional arguments:
h, help show this help message and exit
File format converter¶
For detailed information please see File format converter.
Dump content of supported format¶
usage: nnabla_cli dump [h] [I IMPORT_FORMAT] [nnpnoexpandnetwork]
FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
optional arguments:
h, help show this help message and exit
I IMPORT_FORMAT, importformat IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX])
nnpnoexpandnetwork
[import][NNP] expand network with repeat or recurrent.
Generate NNB config file template¶
usage: nnabla_cli nnb_template [h] [I IMPORT_FORMAT]
[nnpnoexpandnetwork] [b BATCH_SIZE]
[T DEFAULT_VARIABLE_TYPE]
FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
optional arguments:
h, help show this help message and exit
I IMPORT_FORMAT, importformat IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX])
nnpnoexpandnetwork
[import][NNP] expand network with repeat or recurrent.
b BATCH_SIZE, batchsize BATCH_SIZE
[export] overwrite batch size.
T DEFAULT_VARIABLE_TYPE, defaultvariabletype DEFAULT_VARIABLE_TYPE
Default type of variable
File format converter¶
usage: nnabla_cli convert [h] [I IMPORT_FORMAT] [nnpnoexpandnetwork]
[O EXPORT_FORMAT] [f] [b BATCH_SIZE]
[nnpparameterh5] [nnpparameternntxt]
[nnpexcludeparameter] [T DEFAULT_VARIABLE_TYPE]
[s SETTINGS] [c CONFIG] [d DEFINE_VERSION]
FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
optional arguments:
h, help show this help message and exit
I IMPORT_FORMAT, importformat IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX])
nnpnoexpandnetwork
[import][NNP] expand network with repeat or recurrent.
O EXPORT_FORMAT, exportformat EXPORT_FORMAT
[export] export format. (one of [NNP,NNB,CSRC,ONNX])
f, force [export] overwrite output file.
b BATCH_SIZE, batchsize BATCH_SIZE
[export] overwrite batch size.
nnpparameterh5 [export][NNP] store parameter with h5 format
nnpparameternntxt
[export][NNP] store parameter into nntxt
nnpexcludeparameter
[export][NNP] output without parameter
T DEFAULT_VARIABLE_TYPE, defaultvariabletype DEFAULT_VARIABLE_TYPE
Default type of variable
s SETTINGS, settings SETTINGS
Settings in YAML format file.
c CONFIG, config CONFIG
[export] config target function list.
d DEFINE_VERSION, define_version
[export][ONNX] define onnx opset version. e.g. opset_6
[export][NNB] define binary format version. e.g. nnb_3
Plot Monitor class output files¶
Note:
 Plotting subcommands require matplotlib package.
 By default, the following commands show a plot on your display using a
backend rendering engine of matplotlib depending on your environment.
If you want to save a plot as an image or a vector data, use
o
option to specifiy a file name where a plot is saved.
MonitorSeries¶
usage: nnabla_cli plot_series [h] [l LABEL] [o OUTFILE] [x XLABEL]
[y YLABEL] [t TITLE] [T YLIM_MAX]
[B YLIM_MIN] [R XLIM_MAX] [L XLIM_MIN]
infile [infile ...]
Plot *.series.txt files produced by nnabla.monitor.MonitorSeries class.
Example:
nnabla_cli plot_series x "Epochs" y "Squared error loss" T 10 l "config A" l "config B" result_a/Trainingloss.series.txt result_b/Trainingloss.series.txt
positional arguments:
infile Path to input file.
optional arguments:
h, help show this help message and exit
l LABEL, label LABEL
Label of each plot.
o OUTFILE, outfile OUTFILE
Path to output file.
x XLABEL, xlabel XLABEL
Xaxis label of plot.
y YLABEL, ylabel YLABEL
Yaxis label of plot.
t TITLE, title TITLE
Title of plot.
T YLIM_MAX, ylimmax YLIM_MAX
Yaxis plot range max.
B YLIM_MIN, ylimmin YLIM_MIN
Yaxis plot range min.
R XLIM_MAX, xlimmax XLIM_MAX
Xaxis plot range max.
L XLIM_MIN, xlimmin XLIM_MIN
Xaxis plot range min.
MonitorTimeElapsed¶
usage: nnabla_cli plot_timer [h] [l LABEL] [o OUTFILE] [x XLABEL]
[y YLABEL] [t TITLE] [T YLIM_MAX]
[B YLIM_MIN] [R XLIM_MAX] [L XLIM_MIN] [e]
[u TIME_UNIT]
infile [infile ...]
Plot *.timer.txt files produced by nnabla.MonitorTimeElapsed class.
Example:
nnabla_cli plot_timer x "Epochs" l "config A" l "config B" result_a/Epochtime.timer.txt result_b/Epochtime.timer.txt
positional arguments:
infile Path to input file.
optional arguments:
h, help show this help message and exit
l LABEL, label LABEL
Label of each plot.
o OUTFILE, outfile OUTFILE
Path to output file.
x XLABEL, xlabel XLABEL
Xaxis label of plot.
y YLABEL, ylabel YLABEL
Yaxis label of plot.
t TITLE, title TITLE
Title of plot.
T YLIM_MAX, ylimmax YLIM_MAX
Yaxis plot range max.
B YLIM_MIN, ylimmin YLIM_MIN
Yaxis plot range min.
R XLIM_MAX, xlimmax XLIM_MAX
Xaxis plot range max.
L XLIM_MIN, xlimmin XLIM_MIN
Xaxis plot range min.
e, elapsed Plot total elapsed time. By default, it plots elapsed time per iteration.
u TIME_UNIT, timeunit TIME_UNIT
Time unit chosen from {smhd}.
Draw a graph from NNP or .nntxt files¶
Note:
 This feature requires
graphviz
installed as a Python package. Thegraphviz
Python is a interface to graphviz library which is not installed bypip
command. You have to install it usingapt
on Ubuntu for example.
usage: nnabla_cli draw_graph [h] [o OUTPUT_DIR] [n NETWORK] [f FORMAT]
input
Draw a graph in a NNP or nntxt file with graphviz.
Example:
nnabla_cli draw_graph o outputfolder pathtonnp.nnp
positional arguments:
input Path to input nnp or nntxt.
optional arguments:
h, help show this help message and exit
o OUTPUT_DIR, outputdir OUTPUT_DIR
Output directory.
n NETWORK, network NETWORK
Network names to be drawn.
f FORMAT, format FORMAT
Graph saving format compatible with graphviz (`pdf`, `png`, ...).
Development¶
Generate function information¶
usage: nnabla_cli function_info [h] [o OUTFILE] [f FUNC_SET] [c CONFIG]
[t TARGET] [q query] [nnpnoexpandnetwork]
[FILE] [FILE ...]
positional arguments:
FILE Path to nnp file.
optional arguments:
h, help show this help message and exit
o OUTFILE, output OUTFILE
output filename, *.txt or *.yaml, the default is stdout.
f FUNC_SET, all_support FUNC_SET
select function set: NNB, ONNX, the default is nnabla.
c CONFIG, config CONFIG
user config file for target constraint, *.txt file of the
function list or the "opset_" args.
t, target
output target function list.
q, query
query the detail of a function.
nnpnoexpandnetwork
[import][NNP] expand network with repeat or recurrent.
Display version¶
usage: nnabla_cli version [h]
optional arguments:
h, help show this help message and exit
Python API Examples¶
There are a bunch of examples provided in NNabla repository. Please follow [this link](https://github.com/sony/nnablaexamples) to see examples.
Python API Reference¶
Common¶
Config¶
Search config file and get config information from config file.
Config file search order is described in following table. Each config value is overwritten by the following configs.
Type  Posix  Windows 

System wide  /etc/nnabla.conf  c:\ProgramData\NNabla\nnabla.ini 
User  ~/.nnabla  c:\Users\[USERNAME]\AppData\Roaming\NNabla\nnabla.ini 
Default  (Same directory with ‘config.py’)/nnabla.conf  
Local  [CURRENT DIRECTORY]/nnabla.conf 
You can get config value as followings.
from utils.config import nnabla_config
value = nnabla_config.get(CATEGORY, VALUE_NAME)
CATEGORY and VALUE_NAME does not defined in config.py. You can add CATEGORY and VALUE as you like. See Official document for more information.
[CATEGORY]
VALUE_NAME = value
Default values defined in ‘nnabla.conf’ placed same directory with config.py is here.
Logger¶
Wrapper module for logging.
You can use the logger as follows:
from utils.logger import logger
logger.debug('Log message(DEBUG)')
logger.info('Log message(INFO)')
logger.error('Log message(ERROR)')
logger.critical('Log message(CRITICAL)')
With the default settings, it should yield the following output:
$ python scripts/logger_test.py
[nnabla][ERROR]: logger_test.py : <module> : 5 : Log message(ERROR)
[nnabla][CRITICAL]: logger_test.py : <module> : 6 : Log message(CRITICAL)
If you want to output log to file. You must create nnabla.conf file and put following entry.
See nnabla.config
for more information about config file.
[LOG]
log_file_name = /tmp/nbla.log
After this you can get following output.
$ python scripts/logger_test.py
[nnabla][ERROR]: logger_test.py : <module> : 5 : Log message(ERROR)
[nnabla][CRITICAL]: logger_test.py : <module> : 6 : Log message(CRITICAL)
$ cat /tmp/nbla.log
20170119 14:41:35,132 [nnabla][DEBUG]: scripts/logger_test.py : <module> : 3 : Log message(DEBUG)
20170119 14:41:35,132 [nnabla][INFO]: scripts/logger_test.py : <module> : 4 : Log message(INFO)
20170119 14:41:35,132 [nnabla][ERROR]: scripts/logger_test.py : <module> : 5 : Log message(ERROR)
20170119 14:41:35,132 [nnabla][CRITICAL]: scripts/logger_test.py : <module> : 6 : Log message(CRITICAL)

nnabla.logger.
logger
¶
Autoforward mode¶
NNabla provides the dynamic computation graph feature, which enables automatic forward propagation during graph construction. This can be enabled using the set_auto_forward()
function. Backpropagation shall be manually executed on the dynamically constructed graph.

nnabla.
auto_forward
(*args, **kwds)[source]¶ Context for dynamic graph execution mode.
Parameters: auto (bool) – Whether forward computation is executed during a computation graph construction. Returns: bool

nnabla.
set_auto_forward
(auto)[source]¶ Set the default mode for automatic forward propagation.
When it is set to True , forward propagation is invoked immediately when the computation graph is updated.
Parameters: auto (bool) – Whether forward computation is executed when the computation graph is updated. Returns: bool
Context¶

class
nnabla.
Context
(backend=None, array_class='', device_id='0')¶ Context is used to specify the computation engine (cpu, cuda, cudnn etc.) which the function operator modules and optimizer modules shall be ran on. The context can be set for each function, as well as set globally with functions listed in the
contextspecifier()
.Parameters:
Context Specifier API¶

nnabla.
context_scope
(*args, **kwds)[source]¶ Context as Python context.
import nnabla as nn import nnabla.functions as F x = nn.Variable([2, 3 ,4]) ctx = nnabla_ext.cuda.context('0') with context_scope(ctx): # Inside with scope, the specified context is used. with parameter_scope('w1'): l1 = F.relu(F.affine(x, 64)) with parameter_scope('w2'): l2 = F.relu(F.affine(x, 64))

nnabla.
set_default_context
(ctx)[source]¶ Set the default context.
Note
It cannot be called inside any context_scope.
Parameters: ctx (Context) – A Context.

nnabla.
get_current_context
()[source]¶ Get the current context.
It can be set using
nnabla.context_scope()
ornnabla.set_default_context()
.Returns: a current context. Return type: Context
NdArray¶

class
nnabla._nd_array.
NdArray
(shape=tuple())¶ nnabla._nd_array.NdArray
is a deviceagnostic data container for multidimensional arrays (tensors).nnabla._nd_array.NdArray
can also implicitly handle data transfers across different devices (e.g. CPU to CUDA GPU, CUDA GPU to CPU). See Python API Tutorial for more details.NdArray
overrides some arithmetic operators (+
,
,*
,/
,**
). Operands can be either a scalar number,NdArray
orVariable
. An arithmetic operation containingNdArray
returnsNdArray
which stores the output of the computation immediately invoked. Also, inplace arithmetic operations (+=
,=
,*=
,/=
,**=
) are implemented. Note that=
doesn’t perform inplace substitution but just replaces the object reference. Instead, you can usecopy_from()
for inplace substitution.Parameters: shape (tuple or int) – Shape of tuple. 
cast
(self, dtype, ctx=None)¶ Inplace cast of data type of the NdArray. It returns the reference values as a numpy.ndarray only if optional parameter ctx is not given, None otherwise.
Parameters:  dtype (
numpy.dtype
) – Numpy Data type.  ctx (
nnabla.Context
, optional) – Context descriptor.
Returns: numpy.array
ifctx
is None, otherwise nothing. dtype (

copy_from
(self, NdArray arr)¶ Copy values from another NdArray object.
It returns the caller object itself.
nnabla.functions.identity()
is called internally to copy values.Parameters: arr (NdArray) – Values will be copied to the caller object. The shape of arr`
must be same as the caller object.Returns: nnabla.NdArray

data
¶ Returns the values held by this array as a
numpy.ndarray
. Note that only the references are returned, and the values are not copied. Therefore, modifying the returnednnabla._nd_array.NdArray
will affect the data contained inside the NNabla array. This method can also be called as a setter. Note that this may implicitly invoke a data transfer from device arrays to the CPU.Parameters: value ( numpy.ndarray
) –Returns:
numpy.ndarray

dtype
¶ Get dtype.
Returns:
numpy.dtype

fill
(self, value)¶ Fill all of the elements with the provided scalar value.
Note: This method is lazily evaluated. It is evaluated during the forward or backward propagation.
Parameters: value (int, float) – The value filled with.

static
from_numpy_array
(nparr)¶ Create a NdArray object from Numpy array data.
The data is initialized with the given Numpy array.
Parameters: nparr (ndarray) – Numpy multidimensional array. Returns: ~nnabla._nd_array.NdArray

get_data
(self, str mode='rw')¶ Returns the values held by this array as a
numpy.ndarray
with a specified mode.Parameters: mode (str) – Computation becomes more efficient if right one is chosen. * ‘r’: Readonly access. * ‘w’: Writeonly access. * ‘rw’: You can both read and write. See :function:`nnabla._nd_array.NdArray.data for more details.

ndim
¶ Number of dimensions.
Returns: int

shape
¶ Shape of the Nd array.
Returns: tuple of int

size
¶ Total size of the Nd array.
Returns: int

size_from_axis
(self, axis=1)¶ Gets the size followed by the provided axis.
Example
a = nnabla.NdArray([10,9]) a.size_from_axis() # ==> 90 a.size_from_axis(0) # ==> 90 a.size_from_axis(1) # ==> 9 a.size_from_axis(2) # ==> 1
Parameters: axis ( int
, optional) – 1 as defaultReturns: int

strides
¶ Strides.
Returns: tuple of int

zero
(self)¶ Fill all of the elements with 0.
Note: This method is lazily evaluated. It is evaluated during the forward or backward propagation.

Variable¶

class
nnabla.
Variable
¶ Bases:
object
nnabla.Variable
is used to construct computation graphs (neural networks) together with functions in Functions and List of Parametric Functions . It also provides a method to execute forward and backward propagation of the network. Thennabla.Variable
class holds: Reference to the parent function in a computation graph. This provides traceability of all connections in the computation graph.
 Both data and error
signal (gradient) containers as
nnabla._nd_array.NdArray
s.  Some additional information of the computation graph.
Variable
overrides some arithmetic operators (+
,
,*
,/
,**
). Operands can be either a scalar number,NdArray
orVariable
. IfNdArray
is given as either of left or right operand, the arithmetic operation returns anNdArray
which stores the output of the computation immediately invoked. Otherwise, it returnsVariable
holds the graph connection. The computation is invoked immediately when :function:`nnabla.auto_forward` or :function:`nnabla.set_auto_forward(True)` is used.See also
Parameters:  shape (Iterable of int) – Shape of variable.
 need_grad (bool) – Flag for backprop or not.

apply
(self, **kwargs)¶ Helper for setting property, then return self.

backward
(self, grad=1, bool clear_buffer=False, communicator_callbacks=None)¶ Performs a backward propagation starting from this variable until the root variable(s) is/are reached in the function graph. The propagation will stop at a variable with need_grad=False.
Parameters:  grad (scalar,
numpy.ndarray
, ornnabla._nd_array.NdArray
) – The gradient signal value(s) of this variable. The default value 1 is used in an usual neural network training. This option is useful if you have a gradient computation module outside NNabla, and want to use it as a gradient signal of the neural network built in NNabla. Note that this doesn’t modifies the grad values of this variable.  clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory.
 communicator_callbacks (
nnabla.CommunicatorBackwardCallback
or list ofnnabla.CommunicatorBackwardCallback
) – The callback functions invoked when 1) backward computation of each function is finished and 2) all backward computation is finished.
 grad (scalar,

clear_all_graph_links
(self)¶ Clear all intermediate functions and variables.
This method clear all intermediate functions and variables up to this variable in forward pass and is useful for the truncated backpropagation through time (truncated BPTT) in dynamic graph.

d
¶ Returns the values held by this variable, as a
numpy.ndarray
. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the value held by this variable.Parameters: value ( numpy.ndarray
) (optional) –Returns: numpy.ndarray

data
¶ Returns the data held by this variable, as a
NdArray
. This can also be used as a setter.Parameters: ndarray (NdArray) – NdArray object. Size must be the same as this Variable. Returns: NdArray

forward
(self, bool clear_buffer=False, bool clear_no_need_grad=False)¶ Performs a forward propagation from the root node to this variable. The forward propagation is performed on a subset of variables determined by the dependency of this variable. The subset is recursively constructed by tracking variables that the variables in the subset depend on, starting from this variable, until it reaches the root variable(s) in the function graph.
Parameters:  clear_buffer (bool) – Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False.
 clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.

static
from_numpy_array
(data, grad=None, need_grad=None)¶ Create a Variable object from Numpy array(s).
The
data
is initialized with the given Numpy array, as well asgrad
if given.The shape is also determined by the given array.
Parameters: Returns: ~nnabla.Variable

function_references
¶ Returns a list of functions which take this variable as an input. This method can be called only as a getter.
Returns: list of nnabla.function.Function

g
¶ Returns the gradient values held by this variable, as a
numpy.ndarray
. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the gradient held by this variable.Parameters: value ( numpy.ndarray
) –Returns: numpy.ndarray

get_unlinked_variable
(self, need_grad=None)¶ Gets an unlinked (forgetting parent) variable that shares a Variable buffer instance.
Parameters: need_grad (bool, optional) – By default, the unlinked variable will have the same need_grad flag with this variable instance. By specifying a boolean value, the new need_grad flags will be set to the unlinked variable. It is recommended to explicitly specify this option to avoid an unintended behavior. Returns: nnabla._variable.Variable
Example
import numpy as np import nnabla as nn import nnabla.parametric_functions as PF x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]])) y = PF.affine(x, 4, name="y") # Create a new variable of which graph connection is unlinked. # Recommend to specify need_grad option explicitly . z = y.get_unlinked_variable(need_grad=False) print(y.parent) # Affine print(z.parent) # z is unlinked from the parent x but shares the buffers of y. # None

grad
¶ Returns the gradient held by this variable, as a
NdArray
. This can also be used as a setter.Parameters: ndarray (NdArray) – NdArray object. Size must be the same as this Variable. Returns: NdArray

info
¶ info – object
Information of the variable.

ndim
¶ Gets the number of dimensions of this variable.
Returns: int

need_grad
¶ Gets or sets a boolean indicating whether backpropagation is performed at this variable.
Parameters: b (bool) – Whether backpropagation is performed at this variable. Returns: Whether this variable requires gradient or not. Return type: bool

parent
¶ Returns the parent function of this variable. This method can also be called as a setter.
Parameters: func ( nnabla.function.Function
) –Returns: nnabla.function.Function

persistent
¶ Returns the persistent flag of this variable. If True, the variable is not cleared even if clear options in
nnabla._variable.Variable.forward()
andnnabla._variable.Variable.backward()
are enabled. This is useful when you debug the variable values, or log them. This method can also be called as a setter.Parameters: b (bool) – Returns: bool

reset_shape
(self, shape, force=False)¶ Resizes the shape of the variable to a specified shape.
Parameters:  shape (Iterable of int) – Target shape.
 force (bool) – Flag to force reshape.
Note
This method destructively changes the shape of the target variable. For safety,
reshape()
should be used instead.Returns: None

reshape
(self, shape, unlink=False)¶ Returns a new variable, where this variable is reshaped to a specified shape.
Parameters:  shape (Iterable of int) – Target shape.
 unlink (bool) – Unlink graph connection. Or, keep graph connection, i.e. the gradient will be backproped to the original variable.
Returns:

rewire_on
(self, var)¶ Rewire a successor graph of this variable on top of
var
.Parameters: var ( nnabla.Variable
) – The array elements and the parent function ofvar
is copied to`self
as references. Note that the parent function ofvar
is removed.Example
# A. Create a graph A. xa = nn.Variable((2, 8), need_grad=True) ya = F.tanh(PF.affine(xa, 10, name='a')) # B. Create a graph B. xb = nn.Variable((2, 16), need_grad=True) yb = F.tanh(PF.affine( F.tanh(PF.affine(xb, 8, name='b1')), 8, name='b2')) # C. Rewire the graph A on top of B such that # `xb>B>(yb>)xa>A>ya`. Note `yb` is gone. xa.rewire_on(yb) # D. Execute the rewired graph. xb.d = 1 ya.forward() ya.backward()

size_from_axis
(self, axis=1)¶ Gets the size followed by the provided axis.
Example
a = nnabla.Variable([10,9]) a.size_from_axis() # ==> 90 a.size_from_axis(0) # ==> 90 a.size_from_axis(1) # ==> 9 a.size_from_axis(2) # ==> 1
Parameters: axis ( int
, optional) – 1 as defaultReturns: int

unlinked
(self, need_grad=None)¶ This function is deprecated, use get_unlinked_variable instead.

visit
(self, f)¶ Visit functions recursively in forward order.
Parameters: f (function) – Function object which takes nnabla._function.Function
object as an argument.Returns: None
Example
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF # Define a simple networkgraph def network_graph(x, maps=16, test=False): h = x h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="firstconv", with_bias=False) h = F.average_pooling(h, h.shape[2:]) pred = PF.affine(h, 10, name="pred") return pred # You can modify this PrintFunc to get the other informations like inputs(nnabla_func.inputs), outputs and arguments(nnabla_func.info.args) of nnabla functions. class PrintFunc(object): def __call__(self, nnabla_func): print(nnabla_func.info.type_name) x = nn.Variable([1, 3, 16, 16]) output = network_graph(x) output.visit(PrintFunc())
Output :
Convolution AveragePooling Affine

visit_check
(self, f)¶ Visit functions recursively in forward order.
Note
If any of evaluation of the function object returns True, the visit propagation will stop immediately, and will return True.
Parameters: f (function) – Function object which takes nnabla._function.Function
object as an argument. Returns: bool
 Returns True if any of the function object call returns True.
Example
Define a simple networkgraph where AveragePooling function can be added explicitly as below:
def network_graph(x, add_avg_pool=False, maps=16, test=False): h = x h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="firstconv", with_bias=False) if add_avg_pool : h = F.average_pooling(h, h.shape[2:]) else : h = F.relu(h) pred = PF.affine(h, 10, name="pred") return pred # Define 'PrintFunc()' to check whether "AveragePooling" function exists in the networkgraph class PrintFunc(object): def __call__(self, nnabla_func): if nnabla_func.info.type_name =="AveragePooling" : print("{} exists in the graph".format(nnabla_func.info.type_name)) return True else : return False
Create a networkgraph which has AveragePooling function and call visit_check() method :
x = nn.Variable([1, 3, 16, 16]) output = network_graph(x, add_avg_pool=True) #Adding AveragePooling function to the graph print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))
Output :
AveragePooling exists in the graph The return value of visit_check() method is : True
Create a networkgraph which doesn’t have AveragePooling function and call visit_check() method :
nn.clear_parameters() # call this in case you want to run the following code agian output = network_graph(x, add_avg_pool=False) # Exclusion of AveragePooling function in the graph print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))
Output :
The return value of visit_check() method is : False
Functions¶
All NNabla functions are derived from the nnabla.function.Function
class.
Function¶

class
nnabla.function.
Function
¶ Function interface class.
Instances of
nnabla.function.Function
are not directly created by users. It is indirectly created by the functions available innnabla.functions
. These functions returnnnabla.Variable
(s) holding the created function instance as the parent property.
backward
(self, inputs, outputs, accum=None)¶

forward
(self, inputs, outputs)¶

grad_depends_output_data
(self, int i, int o)¶

info
¶ info – object

inplace_data
(self, int i)¶

inplace_data_with
(self, int i)¶

inplace_grad
(self, int i)¶

inplace_grad_with
(self, int i)¶

min_outputs
(self)¶

setup
(self, inputs, outputs)¶
Experimental
Get tags of the function.

List of Functions¶
The nnabla.functions
module provides various types of functions listed below.
These functions takes input nnabla.Variable
(s) as its leading argument(s), followed by options
specific to each function.
 Note:
 The functions can also take
NdArray
(s) as output(s) holding output values of the operation. We call this “Imperative Mode” (NdArray + Functions).
Neural Network Layers¶

nnabla.functions.
affine
(x, weight, bias=None, base_axis=1, n_outputs=1, outputs=None)[source]¶ Affine layer, also called as the fully connected layer. It calculates:
\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}\) is the input and \({\mathbf y}\) is the output.
Parameters:  x (Variable) – Input ND array with shape (\(M_0 \times ... \times M_{B1} \times D_B \times ... \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 weight (Variable) – Weight matrix with shape (\((D_B \times ... \times D_N) \times L\)) [parameter]
 bias (Variable) – Bias vector (\(L\)) [optional][parameter]
 base_axis (int) – Base axis of Affine operation. Dimensions up to base_axis is treated as sample dimension. [default=``1``]
Returns: \((B + 1)\)D array. (\(M_0 \times ... \times M_{B1} \times L\))
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
convolution
(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, n_outputs=1, outputs=None)[source]¶ ND Convolution with bias.
See references for dilated convolution (a.k.a. atrous convolution).
References
 Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.
 Yu et al., MultiScale Context Aggregation by Dilated Convolutions.
Note
Convolution is a computationally intensive operation that should preferrably be run with the cudnn backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, the NNABLA_CUDNN_WORKSPACE_LIMIT environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducable) results. This can be requested by setting the the environment variable NNABLA_CUDNN_DETERMINISTIC to a nonzero value.
Parameters:  x (Variable) – \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
 weight (Variable) – \((2 + N)\)D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]
 bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
 base_axis (int) – base axis \(B\). [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=``1``]
Returns: \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
A spatial size of the output is calculated as
\[L'_i = \frac{L_i + 2 p_i  d_i (k_i  1)  1}{s_i} + 1,\]where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
depthwise_convolution
(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, multiplier=1, n_outputs=1, outputs=None)[source]¶ ND Depthwise Convolution with bias.
References
Parameters:  x (Variable) – \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
 weight (Variable) – \((1 + N)\)D array (\(C \times K_1 \times ... \times K_N\)). [parameter]
 bias (Variable) – Bias vector (\(C\)). [optional][parameter]
 base_axis (int) – base axis \(B\). [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  multiplier (int) – Number of output feature maps per input feature map. [default=``1``]
Returns: \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
The output map size \(C'\) is \(C\) multiplied by \(m\)
\[C' = m \times C,\]where \(m\) is the multiplier.
A spatial size of the output is calculated as
\[L'_i = \frac{L_i + 2 p_i  d_i (k_i  1)  1}{s_i} + 1,\]where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
deconvolution
(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, n_outputs=1, outputs=None)[source]¶ ND deconvolution, also known as transposed convolution, with bias operates backward convolution (derivative of the output w.r.t. the input) plus channelwise learned bias.
The weights are specified in the same manner as
convolution()
, as if it was an ordinary convolution function. The forward operation ofdeconvolution()
will then be operationally equivalent to the backward pass ofconvolution()
. Therefore, the number of input channels (can be seen as output channels of forward convolution) is specified in the first dimension, and the number of the output channels divided by the number of groups is specified in the second dimension.Parameters:  x (Variable) – \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
 weight (Variable) – \((2 + N)\)D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]
 bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
 base_axis (int) – base axis \(B\). [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=``1``]
Returns: \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
A spatial size of the output is calculated as
\[L'_i =s_i (L_i  1)  2 p_i + d_i (k_i  1) + 1,\]where \(s_i\) is the stride, \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, and \(k_i\) is the kernel size for \(i\)th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
depthwise_deconvolution
(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, divisor=1, n_outputs=1, outputs=None)[source]¶ Depthwise deconvolution computes the transposed depthwise convolution with bias for onedimensional and twodimensional input data.
Parameters:  x (Variable) – \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
 weight (Variable) – \((1 + N)\)D array (\(C \times K_1 \times ... \times K_N\)). [parameter]
 bias (Variable) – Bias vector (\(C\)). [optional][parameter]
 base_axis (int) – base axis \(B\). [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  divisor (int) – Number of input feature maps per output feature map. [default=``1``]
Returns: \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
The output map size \(C'\) is \(C\) multiplied by \(m\)
\[C' = \frac{C}{d},\]where \(d\) is the divisor.
A spatial size of the output is calculated as
\[L'_i =s_i (L_i  1)  2 p_i + d_i (k_i  1) + 1,\]where \(s_i\) is the stride, \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, and \(k_i\) is the kernel size for \(i\)th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
max_pooling
(x, kernel, stride=None, ignore_border=True, pad=None, n_outputs=1, outputs=None)[source]¶ Max pooling. It pools the maximum values inside the scanning kernel:
\[y_{i_1, i_2} = \max_{k_1, k_2 \in K} (x_{i_1 + k_1, i_2 + k_2})\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
Parameters:  x (Variable) – Input variable.
 kernel (
tuple
ofint
) – Kernel sizes for each spatial axis.  stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=``kernel``]  ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=``True``]
 pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=``(0,) * len(kernel)``]
Returns: Maximum values variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
average_pooling
(x, kernel, stride=None, ignore_border=True, pad=None, including_pad=True, n_outputs=1, outputs=None)[source]¶ Average pooling. It pools the averaged values inside the scanning kernel:
\[y_{i_1, i_2} = \frac{1}{K_1 K_2} \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
Parameters:  x (Variable) – Input variable.
 kernel (
tuple
ofint
) – Kernel sizes for each spatial axis.  stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=``kernel``]  ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=``True``]
 pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=``(0,) * len(kernel)``]  including_pad (bool) – If true, border padding values are considered for the output. [default=``True``]
Returns: Average values variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
global_average_pooling
(x, n_outputs=1, outputs=None)[source]¶ Warning
This function is experimental support, so please do not actively use it.
Global average pooling. It pools an averaged value from the whole image
Parameters: x (Variable) – Input variable. Returns: Average values variable Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sum_pooling
(x, kernel, stride=None, ignore_border=True, pad=None, n_outputs=1, outputs=None)[source]¶ Sum pooling. It pools the summed values inside the scanning kernel:
\[y_{i_1, i_2} = \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
Parameters:  x (Variable) – Input variable.
 kernel (
tuple
ofint
) – Kernel sizes for each spatial axis.  stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=``kernel``]  ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=``True``]
 pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=``(0,) * len(kernel)``]
Returns: Summed values variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
unpooling
(x, kernel, n_outputs=1, outputs=None)[source]¶ Inverse operation of pooling. It spreads the input values:
\[y_{k_1 i_1 + j_1, k_2 i_2 + j_2} = x_{i_1, i_2}\]where \(_{i_1, i_2}\) is the input and \(y_{k_1 i_1 + j_1, k_2 i_2 + j_2}\) is the output.
Parameters: Returns: Spread values variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
embed
(x0, w, n_outputs=1, outputs=None)[source]¶ Embed slices of a matrix/tensor with indexing array/tensor.
Parameters: Returns: Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
rnn
(x, h, weight_l0, weight=None, bias=None, num_layers=1, nonlinearity=None, dropout=None, bidirectional=False, training=True, n_outputs=1, outputs=None)[source]¶ RNN function implements Elman RNN with nonlineraity to input sequence. RNN function is defined as following:
\[{\mathbf h_t} = {\mathbf \tanh}( {\mathbf w_{ih}} *{\mathbf x_t} + {\mathbf b_{ih}} + {\mathbf w_{hh}}* {\mathbf h_{(t1)}} + {\mathbf b_{hh}}).\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
Parameters:  x (Variable) – Input ND array with shape \((T, B, I)\).
 h (Variable) – Input ND array with shape \((L, D, B, H)\).
 weight_l0 (Variable) – Input ND array with shape \((D, H, I + H)\). [parameter]
 weight (Variable) – Input ND array with shape \((L1, D, H, D * H + H)\). [optional][parameter]
 bias (Variable) – Input ND array with shape \((L, D, H)\). [optional][parameter]
 num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=``1``]
 nonlinearity (string) – Type of nonlinearity applied to input sequcne. Must be either tanh or relu. Default is tanh. [default=``tanh``]
 dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=``0.0``]
 bidirectional (bool) – If True, bidirectional computation will be performed in each layer. Default is False. [default=``False``]
 training (bool) – Backpropagation will be performed only when it is true. Default is True. [default=``True``]
Returns: Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
lstm
(x, h, c, weight_l0, weight=None, bias=None, num_layers=1, dropout=None, bidirectional=False, training=True, n_outputs=1, outputs=None)[source]¶ NStep LSTM layer.
\[\begin{split}{\mathbf f_t} = {\mathbf \sigma}( {\mathbf W_f} *{\mathbf x_t} + {\mathbf U_f}* {\mathbf h_{(t1)}} + {\mathbf b_f})\\ {\mathbf i_t} = {\mathbf \sigma}( {\mathbf W_i} *{\mathbf x_t} + {\mathbf U_i}* {\mathbf h_{(t1)}} + {\mathbf b_i})\\ {\mathbf o_t} = {\mathbf \sigma}( {\mathbf W_o} *{\mathbf x_t} + {\mathbf U_o}* {\mathbf h_{(t1)}} + {\mathbf b_o})\\ {\mathbf c_t} = {\mathbf f_t}\odot {mathbf c_{(t1)}} + {\mathbf i_t}\odot {\mathbf \tanh}({\mathbf W_c}*{\mathbf x_t} + {\mathbf U_c} *{\mathbf h_{(t1)}} + {\mathbf b_c})\\ {\mathbf h_t} = {\mathbf o_t} \odot {\mathbf \tanh}({\mathbf c_t}).\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
Parameters:  x (Variable) – Input ND array with shape \((T, B, I)\).
 h (Variable) – Input ND array with shape \((L, D, B, H)\).
 c (Variable) – Input ND array with shape \((L, D, B, H)\).
 weight_l0 (Variable) – weight parameters for the first layer. Shape is \((D, 4, H, I + H)\). [parameter]
 weight (Variable) – weight parameters for the second layer and above. Shape is \((L1, D, 4, H, D * H + H)\). [optional][parameter]
 bias (Variable) – Bias vector (\(L\)). Shape is \((L, D, 4, H)\). [optional][parameter]
 num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=``1``]
 dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=``0.0``]
 bidirectional (bool) – If True, bidirecitonal computation will be performed in each layer. Default is False. [default=``False``]
 training (bool) – Backpropagation will be performed only when it is True. Default is True. [default=``True``]
Returns: Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\) ~nnabla.Variable: Output \(c_n\) with shape \((L, D, B, H)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
gru
(x, h, weight_l0, weight=None, bias=None, num_layers=1, dropout=None, bidirectional=False, training=True, n_outputs=1, outputs=None)[source]¶ NStep GRU layer.
\[\begin{split}{\mathbf r_t} = {\mathbf \sigma}( {\mathbf W_r} *{\mathbf x_t} + {\mathbf U_r}* {\mathbf h_{(t1)}} + {\mathbf b_r})\\ {\mathbf z_t} = {\mathbf \sigma}( {\mathbf W_z} *{\mathbf x_t} + {\mathbf U_z}* {\mathbf h_{(t1)}} + {\mathbf b_z})\\ {\mathbf n_t} = {\mathbf \tanh}( {\mathbf W_n}{\mathbf x_t}+ {\mathbf b_{in}}+ {\mathbf r_n}( {\mathbf U_n}{\mathbf h_{t1}}+ {\mathbf b_{hn}})) \\ {\mathbf h_t} = (1 {\mathbf z_t})\odot {\mathbf n_t} + {\mathbf z_t}{\mathbf h_{t1}}.\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
Parameters:  x (Variable) – Input ND array with shape \((T, B, I)\).
 h (Variable) – Input ND array with shape \((L, D, B, H)\).
 weight_l0 (Variable) – weight parameters for the first layer. Shape is \((D, 3, H, I + H)\). [parameter]
 weight (Variable) – weight parameters for the second layer and above. Shape is \((L1, D, 3, H, D * H + H)\). [optional][parameter]
 bias (Variable) – Bias vector (\(L\)). Shape is \((L, D, 4, H)\). [optional][parameter]
 num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=``1``]
 dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=``0.0``]
 bidirectional (bool) – If True, bidirecitonal computation will be performed in each layer. Default is False. [default=``False``]
 training (bool) – Backpropagation will be performed only when it is True. Default is True. [default=``True``]
Returns: Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Neural Network Activation¶

nnabla.functions.
sigmoid
(x, n_outputs=1, outputs=None)[source]¶ Elementwise sigmoid function.
\[f(x) = \frac{1}{1 + \exp(x)},\]Parameters: x (Variable) – Input Returns: Output Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
swish
(x, n_outputs=1, outputs=None)[source]¶ Elementwise swish function, by Ramachandran et al. (2017).
\[y_i = \frac{x_i}{1 + \exp(x_i)},\]References
Parameters: x (Variable) – Input Returns: Output Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
tanh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic tangent (tanh) function.
\[y_i = \tanh (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
relu
(x, inplace=False, n_outputs=1, outputs=None)[source]¶ Elementwise Rectified Linear Unit (ReLU) function.
\[y_i = \max (0, x_i)\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
softmax
(x, axis=None, n_outputs=1, outputs=None)[source]¶ Softmax normalization. Calculates
\[y_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]along the dimension specified by axis, where \(y_i\) is the input and \(x_i\) is the output.
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
elu
(x, alpha=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise Exponential Linear Unit (ELU) function.
\[\begin{split}y_i= \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i)  1) & (x \leq 0) \end{array} \right..\end{split}\]References
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
selu
(x, scale=1.05070098735548, alpha=1.673263242354377, n_outputs=1, outputs=None)[source]¶ Elementwise Scaled Exponential Linear Unit (SELU) function by Klambauer et al. (2017).
\[\begin{split}y_i= \lambda \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i)  1) & (x \leq 0) \end{array} \right..\end{split}\]The coefficients \(\lambda\) and \(\alpha\) default to the following values \(\lambda_{01}\) and \(\alpha_{01}\), respectively, provided by Klambauer et al. (2017):
\[\begin{split}\begin{array}{lll} \lambda_{01} &=& \left( 1  \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right) \sqrt{e} \right) \sqrt{2 \pi} \\ && \left( 2 \operatorname{erfc} \left( \sqrt{2} \right) e^2 + \pi \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right)^2 e \right. \\ && \left.  2(2 + \pi) \operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \sqrt{e} + \pi + 2 \right)^{1/2} \\ &\approx& 1.0507 \\ \alpha_{01} &=&  \frac {\sqrt {\frac {2}{\pi}}} {\operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \exp \left(\frac {1} {2} \right)  1} \\ &\approx& 1.67326 \end{array}\end{split}\]References
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
crelu
(x, axis=1, n_outputs=1, outputs=None)[source]¶ Elementwise Concatenated Rectified Linear Unit (CReLU) function. This function calculates the ReLU of \(x\) and \(x\) , then concatenates the results together at a specified axis, and returns the resulting array.
References
Parameters: Returns: ND array where axis dimension is doubled by concatenating.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
celu
(x, alpha=1.0, axis=1, n_outputs=1, outputs=None)[source]¶ Elementwise Concatenated Exponential Linear Unit (CELU) function. Concatenates ELU outputs of positive and negative inputs together at specified axis.
Parameters: Returns: ND array where axis dimension is doubled by concatenating.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
gelu
(x, n_outputs=1, outputs=None)[source]¶ Gaussian Error Unit (GELU) function.
\[GELU(x) = xP(X \leq x) = x \Phi (x)\]which is approximated by
\[GELU(x) = 0.5x (1 + \tanh ( \sqrt(2/\pi)(x + 0.044715x^3) ))\]References
Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
prelu
(x0, x1, base_axis=1, n_outputs=1, outputs=None)[source]¶ Elementwise Parametrized Rectified Linear Unit function. Calculates:
\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]where negative slope \(w\) is learned and can vary across channels (an axis specified with base_axis).
Parameters: Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
leaky_relu
(x, alpha=0.1, inplace=False, n_outputs=1, outputs=None)[source]¶ Elementwise Leaky Rectified Linear Unit (ReLU) function.
It is defined as:
\[y_i = \alpha * \min(0, x_i) + \max (0, x_i)\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Normalization¶

nnabla.functions.
batch_normalization
(x, beta, gamma, mean, variance, axes=[1], decay_rate=0.9, eps=1e05, batch_stat=True, output_stat=False, n_outputs=None)[source]¶ Batch normalization.
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i  \mu\right)^2 \\ \hat{x}_i &=& \frac{x_i  \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &=& \hat{x}_i \gamma + \beta. \end{eqnarray}\end{split}\]At testing time, the mean and variance values used are those that were computed during training by moving average.
References
Parameters:  x (Variable) – ND array of input.
 beta (Variable) – ND array of beta which is learned.
 gamma (Variable) – ND array of gamma which is learned.
 mean (Variable) – ND array of running mean (modified during forward execution).
 variance (Variable) – ND array of running variance (modified during forward execution).
 axes (repeated int64) – Axes mean and variance are taken.
 decay_rate (float) – Decay rate of running mean and variance.
 eps (float) – Tiny value to avoid zero division by std.
 batch_stat (bool) – Use minibatch statistics rather than running ones.
 output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.
Returns: Returns batch normalization output as
Variable
. Ifoutput_stat=True
, it also returns the mean and variance of the minibatchSee also
nnabla.function_bases.batch_normalization
.

nnabla.functions.
mean_subtraction
(x, mean, t, base_axis=1, update_running_mean=True)[source]¶ It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.
At training time, this function is defined as
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i  \mu \end{eqnarray}\end{split}\]At testing time, the mean values used are those that were computed during training by moving average.
Note
The backward performs an approximated differentiation that takes into account only the latest minibatch.
Parameters:  x (Variable) – ND array of input.
 mean (Variable) – ND array of running mean (modified during forward execution).
 t (Variable) – Scalar of num of iteration of running mean (modified during forward execution).
 base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 update_running_mean (bool) – Update running mean during forward execution. [default=``True``]
Returns: ND array.
Return type: See also
nnabla.function_bases.mean_subtraction
.

nnabla.functions.
clip_by_value
(x, min, max)[source]¶ Clip inputs by values.
\[\begin{split}y = \begin{cases} max & (x > max) \\ x & (otherwise) \\ min & (x < min) \end{cases}.\end{split}\]Parameters: Returns: ND array.
Return type:

nnabla.functions.
clip_grad_by_value
(x, min, max, n_outputs=1, outputs=None)[source]¶ In forward pass, the function behaves as the identity.
In backward pass,
\[\begin{split}g_x = \begin{cases} max & (g_y > max) \\ g_y & (otherwise) \\ min & (g_y < min) \end{cases}.\end{split}\]A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to clip gradient values for each feature map,
x = nn.Variable([16, 3, 32, 32]) min = F.broadcast(nn.Variable.from_numpy_array(np.asarray([1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32)) max = F.broadcast(nn.Variable.from_numpy_array(np.asarray([1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32)) c = F.clip_grad_by_value(x, min=min, max=max) h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
Parameters:  x (Variable) – ND array of input.
 min (Variable) – ND array of minimum input value by which the gradients of the y are clipped. Note that the shape of min must be the same as x’s and the backward to min is not performed.
 max (Variable) – ND array of maximum input value by which the gradients of the y are clipped. Note that the shape of max must be the same as x’s and the backward to max is not performed.
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
clip_by_norm
(x, clip_norm, axis=None)[source]¶ Clip inputs by its L2 norm when the L2 norm is larger than the threshold value (defined by clip_norm). If it is less than the threshold, inputs are not modified. If it is applied, the operation is represented as
\[y = N \times \frac{x}{\x\_2}.\]where \(x\) is the input, \(y\) is the output, and \(N\) is clip_norm. this is the case that axes is not set. When axes is set, the norm is computed over axes.
Parameters: Returns: ND array.
Return type:

nnabla.functions.
clip_grad_by_norm
(x, clip_norm=None, axes=None, n_outputs=1, outputs=None)[source]¶ In the forward pass, the function behaves like the identity.
In the backward pass,
\[g_x = N \times \frac{g_y}{\g_y\_2}.\]where \(g_x\) is the gradient w.r.t the input, \(g_y\) is the gradient w.r.t. the output, and \(N\) is clip_norm where the norm of \(g_y\) becomes. this is the case that axes is not set. When axes is set, the norm is computed over axes.
A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to normalize gradient values over feature axis,
x = nn.Variable([16, 3, 32, 32]) c = F.clip_grad_by_norm(x, axes=(1, )) h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
Parameters:  x (Variable) – ND array of input.
 clip_norm (float) – Clip to the norm of input to clip_norm in the backward pass. [default=``1.0``]
 axes (repeated int64) – Axes to be reduced. If empty list is given, all dimensions are reduced to scalar. This is used in the forward pass. [default=``range(x.ndim)``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Reduction¶

nnabla.functions.
sum
(x, axis=None, keepdims=False)[source]¶ Reduction along axes with sum operation.
Parameters: Returns: ND array.
Return type:

nnabla.functions.
mean
(x, axis=None, keepdims=False)[source]¶ Reduction along axes with mean operation.
Parameters: Returns: ND array.
Return type:

nnabla.functions.
max
(x, axis=None, keepdims=False, with_index=False, only_index=False)[source]¶ Reduce the input ND array x along the given axis using the max operation. The axis argument may be a single integer to reduce over one axis, a tuple of integers to reduce over multiple axes, or
None
to reduce over all axes. If keepdims isTrue
, the output will keep all reduced dimensions with size 1. If with_index is True, result is a tuple(sorted, indices)
or onlyindices
if only_index is True. Setting only_index to True implies that with_index is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) maxval = F.max(x, axis=1) assert np.allclose(maxval.d, np.max(x.d, axis=1)) maxval, indices = F.max(x, axis=1, with_index=True) assert np.allclose(maxval.d, np.max(x.d, axis=1)) assert np.all(indices.d == np.argmax(x.d, axis=1)) indices = F.max(x, axis=1, only_index=True) assert np.all(indices.d == np.argmax(x.d, axis=1))
Parameters:  x (Variable) – An input variable.
 axis (None, int or tuple of ints) – Axis or axes along which max is calculated. The default value None will reduce all dimensions.
 keepdims (bool) – Keep reduced axes as dimension with 1 element.
 with_index (bool) – Return tuple of max values and index.
 only_index (bool) – Return only the index of max values.
Returns: ND array.
Return type:

nnabla.functions.
min
(x, axis=None, keepdims=False, with_index=False, only_index=False)[source]¶ Reduce the input ND array x along the given axis using the min operation. The axis argument may be a single integer to reduce over one axis, a tuple of integers to reduce over multiple axes, or
None
to reduce over all axes. If keepdims isTrue
, the output will keep all reduced dimensions with size 1. If with_index is True, result is a tuple(sorted, indices)
or onlyindices
if only_index is True. Setting only_index to True implies that with_index is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) minval = F.min(x, axis=1) assert np.allclose(minval.d, np.min(x.d, axis=1)) minval, indices = F.min(x, axis=1, with_index=True) assert np.allclose(minval.d, np.min(x.d, axis=1)) assert np.all(indices.d == np.argmin(x.d, axis=1)) indices = F.min(x, axis=1, only_index=True) assert np.all(indices.d == np.argmin(x.d, axis=1))
Parameters:  x (Variable) – An input variable.
 axis (None, int or tuple of ints) – Axis or axes along which min is calculated. The default value None will reduce all dimensions.
 keepdims (bool) – Keep reduced axes as dimension with 1 element.
 with_index (bool) – Return tuple of min values and index.
 only_index (bool) – Return only the index of min values.
Returns: ND array.
Return type:

nnabla.functions.
prod
(x, axis=None, keepdims=False)[source]¶ Reduction along axes with product operation.
Parameters: Returns: ND array.
Return type: Note
Backward computation is not accurate in a zero value input.

nnabla.functions.
reduce_sum
(x, n_outputs=1, outputs=None)[source]¶ Reduction along an axis with sum operation.
Note
This is deprecated. Use
sum
instead.Parameters: x (Variable) – ND array. Returns: ND array Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
reduce_mean
(x, n_outputs=1, outputs=None)[source]¶ Reduction by mean along an axis.
Note
This is deprecated. Use
mean
instead.Parameters: x (Variable) – ND array Returns: ND array Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Arithmetic¶

nnabla.functions.
add2
(x0, x1, inplace=False, n_outputs=1, outputs=None)[source]¶ Elementwise addition.
\[y_i = x^{(0)}_i + x^{(1)}_i\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sub2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise subtraction.
\[y_i = x^{(0)}_i  x^{(1)}_i\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
mul2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise multiplication.
\[y_i = x^{(0)}_i x^{(1)}_i\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
div2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise division.
\[y_i = \frac{x^{(0)}_i} {x^{(1)}_i}\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
pow2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise power function.
\[y_i = {(x^{(0)}_i)} ^ {x^{(1)}_i}\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
add_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar addition.
\[y_i = x_i + v\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
mul_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar multiplication.
\[y_i = v x_i\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
pow_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar power function.
\[y_i = (x_i) ^ v\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
r_sub_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar subtraction.
\[y_i = v  x_i\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
r_div_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar division.
\[y_i = \frac{v}{x_i}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
r_pow_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar power function.
\[y_i = v ^ {x_i}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Logical¶

nnabla.functions.
equal
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise ‘equal’
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = x^{(1)}_i) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
equal_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise ‘equal’ with a scalar
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = v) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
greater
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i > x^{(1)}_i) \\ 0 & (x^{(0)}_i \leq x^{(1)}_i) \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
greater_equal
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \geq x^{(1)}_i) \\ 0 & (x^{(0)}_i < x^{(1)}_i) \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
greater_equal_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \geq v \\ 0 & (x^{(0)}_i < v \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
greater_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i > v \\ 0 & (x^{(0)}_i \leq v \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
less
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i < x^{(1)}_i) \\ 0 & (x^{(0)}_i \geq x^{(1)}_i) \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
less_equal
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \leq x^{(1)}_i) \\ 0 & (x^{(0)}_i > x^{(1)}_i) \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
less_equal_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \leq v) \\ 0 & (x^{(0)}_i > v) \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
less_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i < v) \\ 0 & (x^{(0)}_i \geq v) \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_and
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise logical AND.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_and_scalar
(x0, val, n_outputs=1, outputs=None)[source]¶ Elementwise logical AND with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_not
(x0, n_outputs=1, outputs=None)[source]¶ Elementwise logical NOT operation
\[\begin{split}f(x_i) = \begin{cases} 1 & (x_i = 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: x0 (Variable) – Input variable Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_or
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise logical OR.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_or_scalar
(x0, val, n_outputs=1, outputs=None)[source]¶ Elementwise logical OR with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = 0 \;\&\; v = 0) \\ 1 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_xor
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise logical XOR.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_xor_scalar
(x0, val, n_outputs=1, outputs=None)[source]¶ Elementwise logical XOR with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = 0 \;\&\; v = 0) \\ 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
not_equal
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise ‘not equal’
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = x^{(1)}_i) \\ 1 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
not_equal_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise ‘not equal’ with a scalar
\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = v) \\ 1 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sign
(x, alpha=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise sign function.
In the forward pass, it is defined as
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 1 & (x < 0) \\ \alpha & (x = 0) \end{cases}.\end{split}\]In the backward pass, it is defined as
\[\frac{\partial f(x)}{\partial x} = 1,\]or in other words, it behaves as the identity function for the gradient in the backward pass.
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
minimum2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise minimum.
\[y_i = \min(x^{(0)}_i, x^{(1)}_i)\]Parameters: Returns: ND array of min value
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
maximum2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise maximum.
\[y_i = \max(x^{(0)}_i, x^{(1)}_i)\]Parameters: Returns: ND array of max value
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
minimum_scalar
(x, val=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise scalar minimum.
\[y_i = \min(x_i, v)\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
maximum_scalar
(x, val=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise scalar maximum.
\[y_i = \max (x_i, v)\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Math¶

nnabla.functions.
constant
(val=0, shape=[], n_outputs=1, outputs=None)[source]¶ Generate a constantvalued array.
Parameters: Returns: ND array where all values are the specified constant.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
arange
(start, stop, step=1, n_outputs=1, outputs=None)[source]¶ Generate a range of values within the halfopen interval
[start, stop)
(the interval including start but excluding stop) with step increments.Parameters: Returns: 1D array with the generated values.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
abs
(x, n_outputs=1, outputs=None)[source]¶ Elementwise absolute value function.
\[y_i = x_i\]Parameters: x (Variable) – Input variable Returns: Elementwise absolute variable Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
exp
(x, n_outputs=1, outputs=None)[source]¶ Elementwise natural exponential function.
\[y_i = \exp(x_i).\]Parameters: x (Variable) – Input variable Returns: Elementwise exp variable Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
log
(x, n_outputs=1, outputs=None)[source]¶ Elementwise natural logarithm function.
\[y_i = \ln(x_i).\]Parameters: x (Variable) – Input variable Returns: Elementwise log variable Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
round
(x, n_outputs=1, outputs=None)[source]¶ Elementwise round function.
In the forward pass, this function simply computes round to the nearest integer value.
\[y_i = round(x_i).\]In the backward pass, the simple StraightThrough Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]Parameters: x (Variable) – Input variable Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
ceil
(x, n_outputs=1, outputs=None)[source]¶ Elementwise ceil function.
In the forward pass, this function simply returns the smallest integer which is not less than the input.
\[y_i = ceil(x_i).\]In the backward pass, the simple StraightThrough Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]Parameters: x (Variable) – Input variable Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
floor
(x, n_outputs=1, outputs=None)[source]¶ Elementwise floor function.
In the forward pass, this function simply returns the largest integer which is not greater than the input.
\[y_i = floor(x_i).\]In the backward pass, the simple StraightThrough Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]Parameters: x (Variable) – Input variable Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
identity
(x, n_outputs=1, outputs=None)[source]¶ Identity function.
\[y = x\]Parameters: x (Variable) – ND array. Returns: ND array Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
matrix_diag
(x, n_outputs=1, outputs=None)[source]¶ Returns an array where the last two dimensions consist of the diagonal matrix.
Parameters: x (Variable) – ND array with shape (\(M_0 \times \ldots \times M_N\)). Returns: ND array with shape (\(M_0 \times \ldots \times M_N \times M_N\)). Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
matrix_diag_part
(x, n_outputs=1, outputs=None)[source]¶ Returns an array in which the values of the last dimension consist of the diagonal elements of the last two dimensions of an input array.
Parameters: x (Variable) – ND array with shape (\(M_0 \times \ldots \times M_N \times M_N\)). Returns: ND array with shape (\(M_0 \times \ldots \times M_N\)). Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
batch_matmul
(a, b, transpose_a=False, transpose_b=False, n_outputs=1, outputs=None)[source]¶ Batch matrix multiplication.
Two of batchs of matrices are multiplied for each sample in a batch. A batch of matrices is composed as […, P, Q] where the last two dimensions compose matrix dimensions, and the first dimensions up to the third last dimension are considered as batch samples.
Parameters:  a (Variable) – ND array with >= 2dim. The last two dimensions will be treated as a matrix.
 b (Variable) – ND array with >= 2dim. The last two dimensions will be treated as a matrix. The product of the size of 0th dimension through the size of the third last dimension must be same as that of the input
a
.  transpose_a (bool) – Transpose the last two axes of
a
in matrix multiplication. [default=``False``]  transpose_b (bool) – Transpose the last two axes of
b
in matrix multiplication. [default=``False``]
Returns: Output of samplewise matrix multiplication in a batch. When
a
is of a shape of [N, P, Q],b
is of a shape of [N, Q, R], and transpose options are all False, the output will be a shape of [N, P, R].Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sin
(x, n_outputs=1, outputs=None)[source]¶ Elementwise sine (sin) function.
\[y_i = \sin (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
cos
(x, n_outputs=1, outputs=None)[source]¶ Elementwise cosine (cos) function.
\[y_i = \cos (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
tan
(x, n_outputs=1, outputs=None)[source]¶ Elementwise tangent (tan) function.
\[y_i = \tan (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sinh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic sine (sinh) function.
\[y_i = \sinh (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
cosh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic cosine (cosh) function.
\[y_i = \cosh (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
tanh
(x, n_outputs=1, outputs=None)[source] Elementwise hyperbolic tangent (tanh) function.
\[y_i = \tanh (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
asin
(x, n_outputs=1, outputs=None)[source]¶ Elementwise arcsine (asin) function.
\[y_i = \arcsin (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
acos
(x, n_outputs=1, outputs=None)[source]¶ Elementwise arccosine (acos) function.
\[y_i = \arccos (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
atan
(x, n_outputs=1, outputs=None)[source]¶ Elementwise arctangent (atan) function.
\[y_i = \arctan (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
atan2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise arctangent (atan) function with 2 input variables.
\[y_i = \arctan2 (x_{i1}, x_{i2})\]Parameters: Returns: ND array with the same shape as input variables
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
asinh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic arcsine (asinh) function.
\[y_i = \text{arcsinh} (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
acosh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic arccosine (acosh) function.
\[y_i = \text{arccosh} (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
atanh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic arctangent (atanh) function.
\[y_i = \text{arctanh} (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Array Manipulation¶

nnabla.functions.
concatenate
(*x, **kw)[source]¶ Concatenate a variable number of input arrays along the specified axis.
Parameters: Returns: Concatenate variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
split
(x, axis=0)[source]¶ Split arrays at the specified axis.
It returns a number corresponding the size of the given axis (i.e
x.shape[axis]
) ofVariable
s.Parameters: Returns: A
tuple
ofVariable
sSee also
nnabla.function_bases.split()
.

nnabla.functions.
stack
(*x, **kw)[source]¶ Joins two or more arrays on a new axis.
Note
Unlike
nnabla.functions.concatenate()
, which joins arrays on an existing axis, Stack joins arrays on a new axis.Parameters:  *x (Variable) – ND arrays. The sizes of all the arrays to be stacked must be the same. [variadic]
 axis (int) – The axis on which to concatenate arrays. Axis indices take on values 0, 1, 2, and so on from the left. For example, to stack four (3,28,28) inputs on the second axis, specify 1. In this case, the output size will be (3,4,28,28). [default=``0``]
Returns: Output
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
slice
(x, start=None, stop=None, step=None, n_outputs=1, outputs=None)[source]¶ Slice arrays along specified axis. This function complies with python slice wherre slice(None, None, 1) and slice(1, None, 1) are the special case, which flips the input array and results in the output array from the end to the beginning of the input array along the corresponding dimension.
Parameters:  x (Variable) – ND array
 start (repeated int64) – Start indices for each axis [default=``(0,) * len(x.shape)``]
 stop (repeated int64) – Stop indices for each axis [default=``tuple(x.shape)``]
 step (repeated int64) – Step indices for each axis [default=``(1,) * len(x.shape)``]
Returns: Sliced ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
pad
(x, pad_width, mode='constant', constant_value=0, n_outputs=1, outputs=None)[source]¶ Pad the input ND array x over the number of dimensions given by half the length of the pad_width iterable, where every two values in pad_width determine the before and after pad size of an axis. The pad_width iterable must hold an even number of positive values which may cover all or fewer dimensions of the input variable x. If pad_width covers fewer dimensions then it applies to the innermost dimensions of x.
x = nn.Variable.from_numpy_array(np.ones((2, 3, 4))) assert F.pad(x, (1, 1, 2, 2)).shape == (2, 5, 8)
Padding is performed according to the requested mode:
 constant
Pads with a value given by the keyword argument constant_value.
x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int)) y = F.pad(x, (3, 3), 'constant', constant_value = 1) y.forward() assert np.all(y.d == np.array([1, 1, 1, 1, 2, 3, 4, 1, 1, 1]))
 reflect
Pads with the reflection of the vector mirrored on the first and last values of the vector along each axis.
x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int)) y = F.pad(x, (3, 3), 'reflect') y.forward() assert np.all(y.d == np.array([4, 3, 2, 1, 2, 3, 4, 3, 2, 1]))
Parameters: Returns: Padded ND array with the same number of dimensions as the input.
x = nn.Variable((3, 3, 4, 2)) # a shape like (B, C, H, W) # 1D padding: last dim by 1 left and 2 on the right side assert F.pad(x, (1, 2)).shape == (3, 3, 4, 5) # 2D padding: last dim by (1, 1) and 2nd to last by (2, 2) assert F.pad(x, (2, 2, 1, 1)).shape == (3, 3, 8, 4) # 3D padding: dims C by (0, 1), H by (2, 1), and W by (3, 3) assert F.pad(x, (0, 1, 2, 1, 3, 3)).shape == (3, 4, 7, 8)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
transpose
(x, axes, n_outputs=1, outputs=None)[source]¶ Transposes tensor dimensions.
Parameters:  x (Variable) – ND array
 axes (repeated int64) – Source axis indices for each axis.
Returns: Transposed ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
broadcast
(x, shape, n_outputs=1, outputs=None)[source]¶ Broadcasting NDarray to the specified shape.
Parameters: Returns: Broadcasted ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
broadcast_to
(x, y, axis=None, n_outputs=1, outputs=None)[source]¶ Warning
This function is experimental support, so please do not actively use it.
Broadcasting NDarray to the specified buffer.
Parameters: Returns: Broadcasted ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
flip
(x, axes=None, n_outputs=1, outputs=None)[source]¶ Reverses the order of elements of the specified dimension of an array.
Parameters:  x (Variable) – ND array
 axes (repeated int64) – The index of the dimension to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB image (100,3,24,32) vertically and horizontally, specify (2,3). [default=``[len(x.shape)  1]``]
Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
shift
(x, shifts=None, border_mode='nearest', n_outputs=1, outputs=None)[source]¶ Shifts the array elements by the specified amount.
Parameters:  x (Variable) – ND array.
 shifts (repeated int64) – The amount to shift elements. For example, to shift image data to the right by 2 pixels and up 3 pixels, specify (3,2). [default=``(0,) * len(x.shape)``]
 border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. [default=``’nearest’``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sort
(x, axis=1, reverse=False, with_index=False, only_index=False)[source]¶ Sorts the elements of x along a given axis in ascending order by value. A negative axis counts from the last dimension of x, so the default of 1 sorts along the last dimension. If reverse is True, then the elements are soreted in descending order.
If with_index is True, result is a tuple
(sorted, indices)
or onlyindices
if only_index is True. Setting only_index to True implies that with_index is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) sorted = F.sort(x) assert np.allclose(sorted.d, np.sort(x.d)) sorted, indices = F.sort(x, with_index=True) assert np.allclose(sorted.d, np.sort(x.d)) assert np.all(indices.d == np.argsort(x.d)) indices = F.sort(x, only_index=True) assert np.all(indices.d == np.argsort(x.d))
Parameters: Returns:
Variable
sorted orVariable
indices or (Variable
sorted,Variable
indices)

nnabla.functions.
reshape
(x, shape, inplace=True, n_outputs=1, outputs=None)[source]¶ Reshapes the input variable inplace. It does not create a copy of the variable. The output variable (y) has a new shape but points to the same data as the input variable (x). This means that if the data in the output variable (y) is modified, the data in the input variable (x) also gets modified since the reshape was done inplace.
Note
This function has the same behavior as the
nnabla.Variable.reshape()
method.Parameters: Returns: Reshaped ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
one_hot
(x, shape, n_outputs=1, outputs=None)[source]¶ This function creates onehot vector based on input indices.
Example:
import nnabla as nn import nnabla.functions as F import numpy as np labels = nn.Variable.from_numpy_array(np.array([[9], [4], [5], [1], [0]])) print(labels.shape) # (5, 1) num_class = 10 y_train = F.one_hot(labels, shape=(num_class, )) y_train.forward() print(y_train.shape) # (5, 10) print(y_train.d) # [[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] # [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] # [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] # [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] # Can also be used for ndarray. labels = nn.Variable.from_numpy_array(np.array([[1, 7], [4, 7], [8, 6], [5, 0], [2, 6]])) print(labels.shape) # (5, 2) num_class_1, num_class_2 = 10, 8 y_train = F.one_hot(labels, shape=(num_class_1, num_class_2)) y_train.forward() print(y_train.shape) # (5, 10, 8) print(y_train.d) # [[[0. 0. 0. 0. 0. 0. 0. 0.] [[0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 1.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] ... [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.]], [0. 0. 0. 0. 0. 0. 0. 0.]]]
Parameters:  x (Variable) – ND array representing label’s indice.
 shape (
tuple
ofint
) – Number of classes. Note that it must be exactly the same as the number of classes included in label data. Passing incorrect numbers might cause an unexpected error and currently this function doesn’t check if the input is valid or not. Also, when ndlabels are given, dimensions must match. See the example above.
Returns: ND array onehot vector/tensor.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Stochasticity¶

nnabla.functions.
rand
(low=0, high=1, shape=[], seed=1, n_outputs=1, outputs=None)[source]¶ Samples numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and shape of the returned Variable.
Parameters: Returns: Variable with the shape specified in the argument.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
randint
(low=0, high=1, shape=[], seed=1, n_outputs=1, outputs=None)[source]¶ Samples integer numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and shape of the returned Variable.
Parameters: Returns: Variable with the shape specified in the argument. The dtype is int32.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
randn
(mu=0, sigma=1, shape=[], seed=1, n_outputs=1, outputs=None)[source]¶ Samples numbers from a normal distribution \(x \sim N(\mu, \sigma)\) given mean \(\mu\), standard deviation \(\sigma\), and shape of the returned Variable.
Parameters: Returns: Variable with the shape specified in the argument.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
dropout
(x, p=0.5, seed=1, n_outputs=1, outputs=None)[source]¶ Dropout. Samples a number \(u\) from a uniform distribution in \([0, 1]\) , and ignores the input if \(u \leq p\).
\[\begin{split}y = \left\{ \begin{array}{ll} \frac{x}{1  p} & (u > p) \\ 0 & ({\rm otherwise}) \end{array} \right.\end{split}\]Note
Usually dropout only applied during training as below (except Bayesian dropout).
h = PF.affine(x, num_hidden) if train: h = F.dropout(h, 0.5)
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
top_k_data
(x, k, abs=False, reduce=True, base_axis=1, n_outputs=1, outputs=None)[source]¶ Select the k largest values from each sample in x to propagate unmodified and set all other values to 0. If abs is True, the k largest values are selected by magnitude. If reduce is True (the default), all feature dimensions are reduced to a single dimension of size k that propagates only the k largest values. Otherwise, if reduce is False, input and output dimensions are identical. Dimensions before base_axis are treated as number of sample dimensions and k values get selected from all elements of a sample (dimensions from base_axis) regardless of shape.
>>> import nnabla as nn, nnabla.functions as F >>> x = nn.Variable((4, 5, 6)) >>> F.top_k_data(x, 3, reduce=False).shape (4, 5, 6) >>> F.top_k_data(x, 3, reduce=True).shape (4, 3) >>> F.top_k_data(x, 3, reduce=True, base_axis=2).shape (4, 5, 3)
Parameters:  x (Variable) – ND array
 k (int) – Number of largest data values to propagate.
 abs (bool) – Determine largest data values by magnitude. [default=``False``]
 reduce (bool) – Reduce feature size to one dimension of size k. [default=``True``]
 base_axis (int) – First dimension of the sample shape. [default=``1``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
top_k_grad
(x, k, abs=False, base_axis=1, n_outputs=1, outputs=None)[source]¶ Select the k largest gradients for each sample in x to backpropagate unmodified and set all other gradients to 0. If abs is True, the k largest gradients are selected by magnitude. Dimensions before base_axis are treated as number of sample dimensions and k gradients get selected from all gradients of a sample (dimensions from base_axis) regardless of shape.
Parameters: Returns: ND array with same shape and data as x.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
random_crop
(x, shape=None, base_axis=1, seed=1, n_outputs=1, outputs=None)[source]¶ RandomCrop randomly extracts a portion of an array.
Parameters:  x (Variable) – ND array
 shape (
tuple
ofint
) – The data size to extract. For example, to randomly extract a portion of the image (3,48,48) from a 3,64,64 image, specify (3,48,48). [default=``x.shape``]  base_axis (int) – No Description [default=``1``]
 seed (int) – Random seed. When 1, seed is sampled from global random number generator. [default=``1``]
Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
random_flip
(x, axes=None, base_axis=1, seed=1, n_outputs=1, outputs=None)[source]¶ Reverses the order of elements of the specified dimension of an array at 50% probability.
Parameters:  x (Variable) – ND array
 axes (repeated int64) – The index of the axis to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB images (100, 3,24,32) vertically and horizontally at random, specify (2,3). [default=``[len(x.shape)  1]``]
 base_axis (int) – No Description [default=``1``]
 seed (int) – Random seed. When 1, seed is sampled from global random number generator. [default=``1``]
Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
random_shift
(x, shifts=None, border_mode='nearest', base_axis=1, seed=1, n_outputs=1, outputs=None)[source]¶ Randomly shifts the array elements within the specified range.
Parameters:  x (Variable) – ND array.
 shifts (repeated int64) – Max absolute amount to shift elements. For example, to shift image data horizontally by \(\pm 2\) pixels and vertically by \(\pm 3\) pixels, specify (3,2). [default=``(0,) * len(x.shape)``]
 border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. [default=``’nearest’``]
 base_axis (int) – No Description [default=``1``]
 seed (int) – Random seed. When 1, seed is sampled from global random number generator. [default=``1``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
image_augmentation
(x, shape=None, pad=(0, 0), min_scale=1.0, max_scale=1.0, angle=0.0, aspect_ratio=1.0, distortion=0.0, flip_lr=False, flip_ud=False, brightness=0.0, brightness_each=False, contrast=1.0, contrast_center=0.0, contrast_each=False, noise=0.0, seed=1, n_outputs=1, outputs=None)[source]¶ ImageAugmentation randomly alters the input image.
Parameters:  x (Variable) – ND array.
 shape (
tuple
ofint
) – The output image data size. [default=``x.shape``]  pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=``(0, 0)``]  min_scale (float) – The minimum scale ratio when randomly scaling the image. For example, to scale down to 0.8 times the size of the original image, specify “0.8”. To not apply random scaling, set both min_scale and max_scale to “1.0”. [default=``1.0``]
 max_scale (float) – The maximum scale ratio when randomly scaling the image. For example, to scale down to 2 times the size of the original image, specify “2.0”. [default=``1.0``]
 angle (float) – The rotation angle range in radians when randomly rotating the image. The image is randomly rotated in the Angle to +Angle range. For example, to rotate in a +15 degree range, specify “0.26” (15 degrees/360 degrees * 2PI). To not apply random rotation, specify “0.0”. [default=``0.0``]
 aspect_ratio (float) – The aspect ratio range when randomly deforming the image. For example, to deform aspect ratio of image from 1:1.3 to 1.3:1, specify “1.3”. To not apply random deforming, specify “1.0”. [default=``1.0``]
 distortion (float) – The distortion range when randomly distorting the image. To not apply distortion, specify “0.0”. [default=``0.0``]
 flip_lr (bool) – Whether to randomly flip the image horizontally at 50% probability. [default=``False``]
 flip_ud (bool) – Whether to randomly flip the image vertically at 50% probability. [default=``False``]
 brightness (float) – The absolute range of values to randomly add to the brightness. A random value in the Brightness to +Brightness range is added to the brightness. For example, to vary the brightness in the 0.05 to +0.05 range, specify “0.05”. To not apply random addition to brightness, specify “0.0”. [default=``0.0``]
 brightness_each (bool) – Whether to apply the random addition to brightness (as specified by brightness) to each color channel. True: brightness is added based on a different random number for each channel. False: brightness is added based on a random number common to all channels. [default=``False``]
 contrast (float) – The range in which to randomly vary the image contrast. The contrast is varied in the 1/Contrast times to Contrast times range. The output brightness is equal to (input  contrast_center) * contrast + contrast_center. For example, to vary the contrast in the 0.91 times to 1.1 times range, specify “1.1”. To not apply random contrast variation, specify “1.0”. [default=``1.0``]
 contrast_center (float) – Intensity center used for applying contrast. [default=``0.0``]
 contrast_each (bool) – Whether to apply the random contrast variation (as specified by contrast) to each color channel. True: contrast is varied based on a different random number for each channel. False: contrast is varied based on a random number common to all channels. [default=``False``]
 noise (float) – Sigma of normal random number to be added. [default=``0.0``]
 seed (int) – Random seed. When 1, seed is sampled from global random number generator. [default=``1``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Loss Functions¶

nnabla.functions.
sigmoid_cross_entropy
(x, target, n_outputs=1, outputs=None)[source]¶ Elementwise cross entropy between x and the target variables, passed to a sigmoid function.
\[y_i =  \left(x^{(1)}_i \ln \left(\sigma \left(x^{(0)}_i \right)\right) + \ \left(1  x^{(1)}_i\right) \ln \left(1  \sigma \left(x^{(0)}_i \ \right)\right)\right)\]where \(\sigma(s)=\frac{1}{1+\exp(s)}\).
Note
SigmoidCrossEntropy is equivalent to Sigmoid+BinaryCrossEntropy, but computing them at once has the effect of reducing computational error.
Parameters: Returns: ND array of elementwise losses.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_cross_entropy
(x, target, n_outputs=1, outputs=None)[source]¶ Elementwise cross entropy between x and the target variables.
\[y_i =  \left(x^{(1)}_i * \ln \left(x^{(0)}_i\right) + \left(1  \ x^{(1)}_i\right) * \ln \left(1  x^{(0)}_i\right)\right).\]Parameters: Returns: ND array of elementwise losses.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
softmax_cross_entropy
(x, target, axis=None, n_outputs=1, outputs=None)[source]¶ Elementwise cross entropy between the variables and the variables of a label given by a category index with Softmax normalization.
\[y_{j} = \ln \left(\frac{\exp(x_{j,t_j})}{\sum_{i'} \exp(x_{j,i'})}\right)\]along dimension specified by axis (\(i\) is the axis where normalization is performed on).
Note
SoftmaxCrossEntropy is equivalent to Softmax+CategoricalCrossEntropy, but computing them at once has the effect of reducing computational error.
Parameters: Returns: ND array of elementwise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
categorical_cross_entropy
(x, target, axis=None, n_outputs=1, outputs=None)[source]¶ Elementwise cross entropy between x and the target t where targets are given by a category index.
\[y_{j} = \ln \left( x_{j, t_j} \right)\]along dimension specified by axis (\(i\) is the axis where normalization is performed on).
Parameters: Returns: ND array of elementwise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
squared_error
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise squared error
\[y_i = \left(x^{(0)}_i  x^{(1)}_i\right)^2.\]Parameters: Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
absolute_error
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise absolute error
\[y_i =  x^{(0)}_i  x^{(1)}_i .\]Parameters: Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
huber_loss
(x0, x1, delta=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise Huber loss
\[\begin{split}y_i= \left\{ \begin{array}{ll} d^2 & (d < \delta)\\ \delta (2 d  \delta) & ({\rm otherwise}) \end{array} \right.\end{split}\]where \(d = x^{(0)}_i  x^{(1)}_i\)
Parameters: Returns: ND array of elementwise losses.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
epsilon_insensitive_loss
(x0, x1, epsilon, n_outputs=1, outputs=None)[source]¶ Elementwise Epsilon Insensitive Loss
\[\begin{split}y_i= \left\{ \begin{array}{ll}  x^{(0)}_i  x^{(1)}_i   \epsilon & if \ \  x^{(0)}_i  x^{(1)}_i  > \epsilon \\ 0 & otherwise \end{array} \right.\end{split}\]Parameters: Returns: ND array of elementwise losses.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
kl_multinomial
(p, q, base_axis=1, n_outputs=1, outputs=None)[source]¶ The Kullback Leibler Divergence for multinomial distributions.
\[D = \sum_i p_i \log \left( \frac{p_i}{q_i} \right)\]Parameters: Returns: Kullback Leibler divergence \(KL(p \parallel q)\).
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Signal Processing¶

nnabla.functions.
interpolate
(x, scale=None, output_size=None, mode='linear', align_corners=None)[source]¶ Resize an ND array with interpolation.
Scaling factors for spatial dimensions are determined by either
scale
oroutput_size
.nd = len(scale)
ornd = len(output_size)
determines the number of spatial dimensions, and the lastnd
dimensions of the inputx
are considered as the spatial dimensions to be resized.If
scale
is given, theoutput_size
is calculated byoutput_size[i] = floor(scale[i] * x.shape[i  len(scale)])
.Example:
import numpy as np import nnabla as nn import nnabla.functions as F x_data = np.random.rand(64, 3, 224, 224) x = nn.Variable.from_numpy_array(x_data) # Resize by scales y = F.interpolate(x, scale=(2, 2), mode='linear') print(y.shape) # (64, 3, 448, 448) y.forward() print(y.d) # Print output # Resize to a size y2 = F.interpolate(x, output_size=(320, 257), mode='linear') print(y2.shape) # (64, 3, 320, 257) y2.forward() print(y2.d) # Print output
Parameters:  x (Variable) – ND array with an arbitrary number of dimensions.
 scale (tuple of ints) – Scale factors along axes. The default is
None
, and if this is omitted,output_size
must be specified.  output_size (tuple of ints) – The output sizes for axes. If this is
given, the scale factors are determined by the output sizes and the
input sizes. The default is
None
, and if this is omitted,scale
must be specified.  mode (str) – Interpolation mode chosen from (‘linear’’nearest’). The default is ‘linear’.
 align_corners (bool) – If true, the corner pixels of input and output
arrays are aligned, such that the output corner pixels have the
same values with the input corner pixels.
The default is
None
, and it becomesTrue
if mode is ‘linear’, otherwiseFalse
.
Returns: ND array.
Return type:

nnabla.functions.
fft
(x, signal_ndim, normalized=False, n_outputs=1, outputs=None)[source]¶ Complextocomplex Discrete Fourier Transform,
\[X_{k_1, \ldots, k_d} = \sum_{n_1=0}^{N_11} \dots \sum_{n_d=0}^{N_d1} x_{n_1, \ldots, n_d} \exp\left(2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]where
\[k_i = 0, \ldots, N_i  1.\]This function now supports 1D, 2D, and 3D DFT with or without the leading batch dimension(s).
The input is expected to be complexvalued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F from nnabla.ext_utils import get_extension_context ctx = get_extension_context("cudnn") nn.set_default_context(ctx) # Example for a batched 2DFFT and 2DIFFT (batchsize: 2, datasize: 4x3) x_data = np.random.rand(2, 4, 3) + 1j * np.random.rand(2, 4, 3) x = nn.Variable.from_numpy_array(np.stack([np.real(x_data), np.imag(x_data)], axis=3)) y = F.fft(x, signal_ndim=2, normalized=True) z = F.ifft(y, signal_ndim=2, normalized=True) z.forward() np.allclose(z.d[..., 0] + 1j*z.d[...,1], x_data)
Parameters: Returns: FFT transformed signal.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
ifft
(x, signal_ndim, normalized=False, n_outputs=1, outputs=None)[source]¶ Complextocomplex inverse Discrete Fourier Transform,
\[X_{k_1, \ldots, k_d} = \frac{1}{\prod_{i=1}^{d} N_i} \sum_{n_1=0}^{N_11} \dots \sum_{n_d=0}^{N_d1} x_{n_1, \ldots, n_d} \exp\left(2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]where
\[k_i = 0, \ldots, N_i  1.\]This function now supports 1D, 2D, and 3D DFT with or without the leading batch dimension(s).
The input is expected to be complexvalued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.
Parameters: Returns: IFFT transformed signal.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Quantized Neural Network Layers¶

nnabla.functions.
binary_sigmoid
(x, n_outputs=1, outputs=None)[source]¶ Elementwise binary sigmoid function. In the forward pass, it computes
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 0 & ({\rm otherwise})\end{cases},\end{split}\]but in the backward pass, a straightthrough approximation of the gradient is used, i.e.,
\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (x \geq 1) \\ \frac{1}{2} & ({\rm otherwise}) \end{cases}.\end{split}\]References
Parameters: x (Variable) – Input . Returns: Output. Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_tanh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise binary tanh function. In the forward pass, it computes
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 1 & ({\rm otherwise}) \end{cases},\end{split}\]but in the backward pass, a straightthrough approximation of the gradient is used, i.e.,
\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (x \geq 1) \\ 1 & ({\rm otherwise}) \end{cases}.\end{split}\]References
Parameters: x (Variable) – Input . Returns: Output. Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_connect_affine
(x, weight, binary_weight, bias=None, base_axis=1, quantize_zero_to=1.0, n_outputs=1, outputs=None)[source]¶ This function provides a BinaryConnect affine layer. It computes in the forward pass
\[y_j = \sum_{i} sign(w_{j,i}) x_i,\]i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.
This function should be used together with
batch_normalization()
.Note
1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).
2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.
References
Parameters:  x (Variable) – Input .
 weight (Variable) – Weight . [parameter]
 binary_weight (Variable) – Binarized weight . [parameter]
 bias (Variable) – Bias. [optional][parameter]
 base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 quantize_zero_to (float) – Input value at zero is quantized to this value. [default=``1.0``]
Returns: Output.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_connect_convolution
(x, weight, binary_weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, n_outputs=1, outputs=None)[source]¶ This function provides a BinaryConnect convolution layer. It computes in the forward pass
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j},\]i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.
This function should be used together with
batch_normalization()
.Reference
Note
1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).
2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.
Parameters:  x (Variable) – Input.
 weight (Variable) – Weight. [parameter]
 binary_weight (Variable) – Binarized weight. [parameter]
 bias (Variable) – Bias. [optional][parameter]
 base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=``1``]
 quantize_zero_to (float) – Input value at zero is quantized to this value. [default=``1.0``]
Returns: Output
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_weight_affine
(x, weight, binary_weight, alpha, bias=None, base_axis=1, quantize_zero_to=1.0, n_outputs=1, outputs=None)[source]¶ This function provides a Binary Weight Network affine layer. It computes in the forward pass
\[y_j = \frac{1}{\\mathbf{w}_j\_{\ell_1}} \sum_{i} sign(w_{j,i}) x_i\]i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_j = \frac{1}{\\mathbf{w}_j\_{\ell_1}}\).
Reference
Note
1) If you would like to share the binary weights with other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).
2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.
Parameters:  x (Variable) – Input .
 weight (Variable) – Weight. [parameter]
 binary_weight (Variable) – Binarized weight. [parameter]
 alpha (Variable) – Alpha. [parameter]
 bias (Variable) – Bias. [optional][parameter]
 base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 quantize_zero_to (float) – Input value at zero is quantized to this value. [default=``1.0``]
Returns: Output.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_weight_convolution
(x, weight, binary_weight, alpha, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, n_outputs=1, outputs=None)[source]¶ This function provides a Binary Weight Network convolution layer. It computes in the forward pass
\[y_{n, a, b} = \frac{1}{\\mathbf{w}_n\_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_n = \frac{1}{\\mathbf{w}_n\_{\ell_1}}\).
Reference
Note
1) If you would like to share the binary weights between other standard layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).
2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.
Parameters:  x (Variable) – Input.
 weight (Variable) – Weight. [parameter]
 binary_weight (Variable) – Binarized weight. [parameter]
 alpha (Variable) – Alpha. [parameter]
 bias (Variable) – Bias. [optional][parameter]
 base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=``1``]
 quantize_zero_to (float) – Input value at zero is quantized to this value. [default=``1.0``]
Returns: Output
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
fixed_point_quantize
(x, sign=True, n=8, delta=0.0625, quantize=True, ste_fine_grained=True, outputs=None)[source]¶ Fixed Point Quantize
Parameters:  x (Variable) – An input variable.
 sign (bool) – Indicate the signed number or the unsigned number. Default is true.
 n (int) – Bit width used. Note that sign consumes one bit. \(n1\) is used for number representation in signed case.
 delta (float) – Step size.
 quantize (bool) – If true, quantize input, otherwise not.
 ste_fine_grained (bool) – If true, STE is not 1.
Returns: ND array.
Return type: See also
nnabla.function_bases.fixed_point_quantize
.In the forward pass,
\[\begin{split}\begin{equation} q_i= \left\{ \begin{array}{ll} max & if \ \ \ x_i > max \\ sign(x_i) \times floor(x_i \delta^{1} + 2^{1}) \times \delta & if \ \ min \le x_i \le max \\ min & if \ \ x_i < min \\ \end{array} \right., \end{equation}\end{split}\]where \(\delta\) is the step size, \((min, max) :=( (2^{n1}  1)\delta, (2^{n1}  1)\delta)\) if \(sign\) is true, \((min, max) := (0, (2^n  1) \delta)\) otherwise, and \(n\) is the total bitwidth used.
In the backward pass when using ste_fine_grained as false,
\[\begin{equation} \frac{\partial q_i}{\partial x_i} = 1. \end{equation}\]In the backward pass when using ste_fine_grained as true,
\[\begin{split}\begin{equation} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > max \\ 1 & if \ \ min \le x_i \le max \\ 0 & if \ \ x_i < min \\ \end{array} \right.. \end{equation}\end{split}\]Note
Quantized values are stored as floating point number, since this function is for simulation purposes.

nnabla.functions.
pow2_quantize
(x, sign=True, with_zero=True, n=8, m=1, quantize=True, ste_fine_grained=True, outputs=None)[source]¶ Pow2 Quantize
Parameters:  x (Variable) – An input variable.
 sign (bool) – Indicate the signed number or the unsigned number. Default is true.
 with_zero (bool) – Indicate using zero as a quantized value. Default is true. Note that zero consumes one bit.
 n (int) – Bit width used. Note that sign consumes one bit. \(n1\) is used for number representation in signed case. Default is 8.
 m (int) – \(2^m\) is the upper bound of the dynamic range and \(2^m\) is the lower bound, \(m \in \mathcal{Z}\). Default is 1.
 quantize (bool) – If true, quantize input, otherwise not.
 ste_fine_grained (bool) – If true, STE is not 1.
Returns: ND array.
Return type: See also
nnabla.function_bases.pow2_quantize
.In the forward pass of signed case,
\[\begin{split}q_i= \left\{ \begin{array}{ll} max_{+} & if \ \ \overline{q_i} > max_{+} \\ \overline{q_i} & if \ \ min_{+} \le \overline{q_i} \le max_{+} \\ min_{+} & if \ \ 0 \le \overline{q_i} < min_{+} \\ min_{} & if \ \ min_{} < \overline{q_i} < 0 \\ \overline{q_i} & if \ \ max_{} \le \overline{q_i} \le min_{}\\ max_{} & if \ \ \overline{q_i} < max_{} \\ \end{array} \right.,\end{split}\]where
\[\begin{split}&& max_{+} = 2^{m}, min_{+} = 2^{m  (2^{n1}  1)},\\ && max_{} = 2^{m}, min_{} = 2^{m  (2^{n1}  1)},\\ && \overline{q_i} = sign(x_i) \times 2^{round(\log_2 x_i)}.\end{split}\]This quantization uses the geometric mean between two poweroftwo numbers as quantization threshold.
In the forward pass of unsigned case,
\[\begin{split}q_i= \left\{ \begin{array}{ll} max & if \ \ \overline{q_i} > max \\ \overline{q_i} & if \ \ min \le \overline{q_i} \le max \\ min & if \ \ 0 < \overline{q_i} < min \\ \end{array} \right.,\end{split}\]where
\[\begin{split}&& max = 2^{m}, min = 2^{m  (2^{n}  1)},\\ && \overline{q_i} = 2^{int(\log_2 x_i)}.\end{split}\]When using with_zero as true, a pruning threshold is used to round an input to 0 or \(min\). The pruning threshold is defined in this function as the following,
\[pruning\ threshold = min \times 2^{\frac{1}{2}}.\]If an absolute value of the input is lesser than this value, the input is rounded to 0, otherwise \(min\).
In the backward pass when using ste_fine_grained as false,
\[\frac{\partial q_i}{\partial x_i} = 1.\]In the backward pass when using ste_fine_grained as true,
\[\begin{split}\frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \overline{q_i} > max_{+} \\ 1 & if \ \ otherwise \\ 0 & if \ \ \overline{q_i} < max_{} \\ \end{array} \right..\end{split}\]

nnabla.functions.
prune
(x, rate=0.9, n_outputs=1, outputs=None)[source]¶ Prune the input as the following equation,
\[\begin{split}q_i = \left \{ \begin{array}{ll} 0 & abs(x_i) < threshold \\ x_i & otherwise \end{array} \right.\end{split}\]where \(threshold\) is determined by threshold = np.sort(np.abs(x))[int((x.size  1) * rate)].
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Unsupported, Special Use¶

nnabla.functions.
vat_noise
(x, w, base_axis=1, eps=1.0, n_outputs=1, outputs=None)[source]¶ Noise for virtual adversarial training.
This layer is a special layer for GUI network designing, specialized for getting the noise of virtual adversarial training.
In the backward process, the weight parameter will be replaced with the gradient.
Forward
\[y_i = \frac{\epsilon x_i}{\sqrt{\sum_k x_k^2 + c}}\]Backward
\[\delta x_i = 0\]\[w_i = \epsilon \delta y_i\]Note
This layer is a special layer for GUI network designing.
References
Parameters:  x (Variable) – ND array of noise input. Noise is standard Gaussian noise initially, but the next step, fed back gradient variable.
 w (Variable) – ND array for keep gradient values.
 base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 eps (float) – Noise norm (l2) factor. [default=``1.0``]
Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
unlink
(x, n_outputs=1, outputs=None)[source]¶ This function behaves as an identity function on the forward pass, and deletes the gradient for the background pass.
This layer is a special layer for GUI network designing, used for getting zero backward operation by adding this layer.
Forward
\[y_i = x_i\]Backward
\[\delta x_i = 0\]Note
This layer is a special layer for GUI network designing.
Parameters: x (Variable) – ND array. Returns: ND array. Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sink
(*x, **kw)[source]¶ Creates a dummy variable used to call forward or backward function of multiple variables at one place.
This takes any numbers of input variables with any shape, and creates a single 0shape outputs. The forward pass does nothing. The backward pass set ones to the input grads if one_input_grad is set as true.
Note
sink
can only be called at the very end of the graph, andgrad
of input variables are clearedwheny.backward(clear_buffer=True)
is called.Parameters: Returns: Dummy variable.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Image Object Detection¶

nnabla.functions.
nms_detection2d
(x, thresh=None, nms=None, nms_per_class=None, n_outputs=1, outputs=None)[source]¶ NonMaximum Suppression (NMS) to 2D Object detector output. The input is a 3dimensional tensor with shape of
(B, N, 5 + C)
whereB
denotes batch size,N
denotes the number of detection box candidates, andC
denotes the number of classes of object detection.5 + C
consists of the box coordinatesx, y, w, h
in normalized coordinates (size of each x and y are 1.0), objectness (learned to predict IoU value to ground truth box), and the classprobabilities ofC
classes.It outputs a tensor with the same dimensions as the input, where all values are copied from the input to the output, except the class probabilities are multiplied by objectness, and possibly suppressed to 0 by NMS. During NMS, all of combination of pairs of bounding boxes is compared. For each pair, the bounding box with a lower detection score (described below) is suppressed if the overlap ratio (the IoU) is greater than the value of
nms
.There are two suppression modes for NMS.
1. Suppress by class probability (
nms_per_class
isTrue
): For each bounding box, the detection score is calculated byobjectness * probability[class_id]
for each class. The suppression is done for each class independently.2. Suppress by objectness (
nms_per_class
isFalse
): The suppression is done for each bounding box usingobjectness
as a detection score. All class probabilities becomes 0 for every suppressed boxes.References
Parameters: Returns: A 3dim array with the same dimensions with the input.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Validation¶

nnabla.functions.
top_n_error
(x, target, axis=None, n=1, n_outputs=1, outputs=None)[source]¶ Top N error along the dimension specified by the axis, the element of outputs is
\[\begin{split}y_i = \left \{ \begin{array}{l} 1 \ (x_i \ is \ not \ within \ Nth \ place) \\ 0 \ (x_i \ is \ within \ Nth \ place) \end{array} \right.\end{split}\]Parameters:  x (Variable) – Probabilities ND array. \(D_1 \times ... \times D_i \times ... \times D_N\)
 target (Variable) – ND array of labels. \(D_1 \times ... \times 1 \times ... \times D_N\)
 axis (int) – Axis on which the top N error is calculated. [default=``len(x.shape)  1``]
 n (int) – top N [default=``1``]
Returns: Elementwise error ND array. (\(D_1 \times ... \times 1 \times ... \times D_N\))
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Parametric Functions¶
In NNabla, trainable models are created by composing functions that have optimizable parameters.
These functions are called parametric functions.
Parametric functions are provided by nnabla.parametric_functions
.
 See also:
 Python API Tutorial.
Parameter Management API¶
The parameters registered by List of Parametric Functions can be managed using APIs listed in this section.

nnabla.parameter.