Yet Another Neural Network Toolbox¶
Welcome to the Yann Toolbox. It is a toolbox for building and learning convolutional neural networks, built on top of theano. This toolbox is a homage to Prof. Yann LeCun, one of the earliest poineers of CNNs. To setup the toolbox refer the Installation Guide guide. Once setup, you may start with the Quick Start guide or try your hand at the Tutorials and the guide to Getting Started. A user base discussion group is setup on gitter and also on google groups.
If you are here for the theano-tensorflow migration tool, click [here](http:://www.tf-lenet.readthedocs.io).
Warning
Yann is currently under its early phases and is presently undergoing massive development. Expect a lot of changes. Unittests are only starting to be written, therefore the coverage and travis build passes are not to be completely trusted. The toolbox will be formalized in the future but at this moment, the authorship, coverage and maintanence of the toolbox is under extremely limited manpower.
Note
While, there are more formal and wholesome toolboxes that are similar and have a much larger userbase such as Lasagne, Keras, Blocks and Caffe, this toolbox is designed differently. This is much simpler and versatile. Yann is designed as a supplement to an upcoming beginner’s book on Convolutional Neural Networks and also the toolbox of choice for a introductory course on deep learning for computer vision.
Because of this reason, Yann is specifically designed to be intuitive and easy to use for beginners. That does not compromise Yann of any of its core purpose - to be able to build CNNs in a plug and play fashion. It is still a good choice for a toolbox for running pre-trained models and build complicated, non-vannilla CNN architectures that are not easy to build with the other toolboxes. It is also a good choice for researchers and industrial scientists, who want to quickly prototype networks and test them before developing production scale models.
Getting Started¶
The following will help you get quickly acquinted with Yann.
Installation Guide¶
Yann is built on top of Theano. Theano and all its pre-requisites are mandatory. Once theano and its pre-requisites are setup you may setup and run this toolbox. Theano setup is documented in the theano toolbox documentation. Yann is built with theanoo 0.8 but should be forward compatible unless theano makes a drastic release.
Quick fire Installation¶
Now before going through the full-fledged installation procedure, you can run through the entire installation in one command that will install the basics required to run the toolbox. To install the toolbox quickly do the following:
pip install git+git://github.com/ragavvenkatesan/yann.git
If it showed any errors, install numpy
first. skdata
has some issue that requires numpy
installed first. If you use anaconda, just install the numpy and scipy using conda install
instead of pip install
. This will setup the toolbox for all intentions and purposes.
Verify that the installation of theano is indeed version 0.9 or greater by doing the following in a python shell
import theano
theano.__version__
If the version was not 0.9, you can install 0.9 by doing the following:
pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
For a full-fledged installation procedure, don’t do the above but run through the following set of instructions. If you want to install all other supporting features like datasets, visualizers and others, do the following:
pip install -r requirements_full.txt
pip install git+git://github.com/ragavvenkatesan/yann.git
Full installation¶
Dependencies¶
Python + pip / conda¶
Yann needs Python 2.7. Please install it for your OS.. Some modules that are required don’t come with default python. But don’t worry python comes with a package installer called pip. You can use pip to install additional packages.
For a headache free installation, the anaconda distribution of python is very strongly recommended because it comes with a lot of goodies pre-packaged.
C compiler¶
You need a C compiler, not because yann needs C, but theano and probably numpy requires C compilers. Make sure that your OS has one. Apple osX or macOS users, if you are using Cuda and cuDNN, prefer using command line tools 7.x+. 8 doesn’t work with cuDNN at the moment of writing this documentation. You can download older versions of xcode and command line tools here.
numpy/scipy¶
Numpy 1.6 and Scipy 0.11 are needed for yann. Make sure these work well with a blas system. Prefer Intel MKL for blas, which is also availabe from anaconda. MKL is free for students and researchers and is available for a small price for others.
If you use pip use
pip install numpy pip install scipyto install these. If you use anaconda, use
conda install mkl conda install numpy conda install scipyto set these up. If not, yann installer will
pip install numpy scipy
anyway as part of its requirements.
Theano¶
Once all the pre-requisites are setup, install theano version 0.8 or higher.
The following .theanorc
configuration can be used as a sample normally,
but you may choose other options. As an example one can use the following:
[global]
floatX=float32
device=cuda0
optimizer_including=cudnn
mode = FAST_RUN
[nvcc]
nvcc.fastmath=True
allow_gc=False
[cuda]
root=/usr/local/cuda/
[blas]
ldflags = -lmkl
[lib]
cnmem = 0.5
If you use the libgpuarray
backend instead of the CUDA backend, use device=cuda0
or whichever device you want to run on.
If you are using CUDA backed use device=gpu0
. Refer theano documentation for more on this.
Optional Dependencies¶
These are some optional dependencies that yann doesn’t use directly but are used by yann’s dependencies like theano. I highly recommend these before installing theano.
Cuda¶
This is an optional dependency. If you need the capability of a Nvidia GPU, you will need a suitable CUDA toolkit and drivers. If you do not have this dependency installed, you won’t be able to run the code on Nvidia GPUs.Some compoenents of the code depend on cuDNN for speeding things up, so cuDNN is highly recommended although optional. Nvidia has the awesome cuDNN library that is free as long as you register as a developer. If you didn’t install CUDA, you can still run the toolbox, but it will be much slower running on a CPU.
Libgpuarray¶
libgpuarray is now fully supported, cuda backend is strongly recommended for macOS, but for the Pascal architecture of GPUs,libgpuarray
seems to be performing much better. This is also an optional but highly recommended tool
Additional Dependencies¶
Yann also needs the following as additional dependencies that opens up additional features.
Networkx¶
For those who are networking geeks, a neural network is a directed acyclic graph. So Yann internally has the ability for every network to create a
networkx
style graph and do things with it if you need. Networkx is a tremendously popular tool for network realted tasks and we are still exploring and testing its capabilities. This might only ever be used for visualization of network purposes, but some researcher somewhere might use this once in the future networks get sophisticated, we never know. This is an optional dependency, not having this dependency doesn’t affect the toolbox, except for the purposes it is needed for.You can install
networkx
as follows:pip install networkx
skdata¶
Used as a port for datasets. This is Needed if you are using some common benchmark datasets. Although this is an additional dependency, skdata is the core of the datasets module and most datasets in this toolbox are ported through skdata unless you have matlab. Work is on-going in integrating with fuel and other ports.
Install by using the following command:
pip install skdata
progressbar¶
Yann uses progressbar for aesthetic printing. You can install it easily by using
pip install progressbarIf you don’t have progressbar, yann will simply ignore it and print progress on terminal.
Dependencies for visualization¶
Theano needs pydot and graphviz for visualization. We use theano’s visualization for printing theano functions as shown here.
These visualizations are highly useful during debugging. If you want the capability of producing these for your networks, install the dependencises using the following commands:
apt-get install graphviz pip install graphviz pip install pydot pydot-ngNot needed now, but might need in future. Yann will switch from openCV to matplotlib or browser matplotlib for visualization. Install it by
pip insall matplotlib
cPickle, gzip and hdf5py¶
Most often the case is that cPickle and gzip these come with the python installation, if not please install them. Yann uses these for saving down models and such.
For datasets, at the moment, yann uses cpickle. In the future, yann will migrate to hdf5 for datasets. We don’t use hdf5py at the moment. Install hdf5py by running either,
conda install h5pyor,
pip install h5py
Yann Toolbox Setup¶
Finally to install the toolbox run,
pip install git+git://github.com/ragavvenkatesan/yann.git
If you have already setup the toolbox and want to just update to the bleeding-edge use,
pip install --upgrade git+git://github.com/ragavvenkatesan/yann.git
If you want to build by yourself you may clone from git and then run using setuptools. Ensure that you have setuptools installed first.
pip install git setuptools
Once you are done, you clone the repository from git.
git clone http://github.com/ragavvenkatesan/yann
Once cloned, enter the directory and run installer.
cd yann
python setup.py install
You can run a bunch of tests ( working on it ) by running the following code:
python setup.py test
Tutorials¶
If you are here for the first time you might want to consider doing the Quick Start instead of doing the tutorials. The tutorials are meant for those who have initial practice or experience with the toolbox and its structure. If you’d just want to see the codes or run the examples for testing or other such purposes you could follow this tutorial/API. I recommend going through the tutorial just in case though.
Logistic Regression.¶
Tutorial for logistic regression is basically the Quick Start guide. Please follow the tutorial there. A full working code is presented in the following.
Notes
This code contains one method that explains how to build a logistic regression classifier for the MNIST dataset using the yann toolbox.
For a more interactive tutorial refer the notebook at yann/pantry/tutorials/notebooks/Logistic Regression.ipynb
Multi-layer Neural Network.¶
By virture of being here, it is assumed that you have gone through the Quick Start. Let us take this one step further and create a neural network with two hidden layers. We begin as usual by importing the network class and creating the input layer.
from yann.network import network
net = network()
dataset_params = { "dataset": "_datasets/_dataset_xxxxxx", "id": 'mnist', "n_classes" : 10 }
net.add_layer(type = "input", id ="input", dataset_init_args = dataset_params)
Instead of connecting this to a classfier as we saw in the Quick Start , let us add a couple
of fully connected hidden layers. Hidden layers can be created using layer type = dot_product
.
net.add_layer (type = "dot_product",
origin ="input",
id = "dot_product_1",
num_neurons = 800,
regularize = True,
activation ='relu')
net.add_layer (type = "dot_product",
origin ="dot_product_1",
id = "dot_product_2",
num_neurons = 800,
regularize = True,
activation ='relu')
Notice the parameters passed. num_neurons
is the number of nodes in the layer. Notice also
how we modularized the layers by using the id
parameter. origin
represents which layer
will be the input to the new layer. By default yann assumes all layers are input serially and
chooses the last added layer to be the input. Using origin
, one can create various types of
architectures. Infact any directed acyclic graphs (DAGs) that could be hand-drawn could be
implemented. Let us now add a classifier and an objective layer to this.
net.add_layer ( type = "classifier",
id = "softmax",
origin = "dot_product_2",
num_classes = 10,
activation = 'softmax',
)
net.add_layer ( type = "objective",
id = "nll",
origin = "softmax",
)
Again notice that we have supplied a lot more arguments than before. Refer the API for more details.
Let us create our own optimizer module this time instead of using the yann default. For any
module
in yann, the initialization can be done using the add_module
method. The
add_module
method typically takes input type
which in this case is optimizer
and a set
of intitliazation parameters which in our case is params = optimizer_params
.
Any module params, which in this case is the optimizer_params
is a dictionary of relevant
options. A typical optimizer setup
is:
optimizer_params = {
"momentum_type" : 'polyak',
"momentum_params" : (0.9, 0.95, 30),
"regularization" : (0.0001, 0.0002),
"optimizer_type" : 'rmsprop',
"id" : 'polyak-rms'
}
net.add_module ( type = 'optimizer', params = optimizer_params )
We have now successfully added a Polyak momentum with RmsProp back propagation with some
and co-efficients that will be applied to the layers for which we passed as argument
regularize = True
. For more options of parameters on optimizer refer to the optimizer
documentation . This optimizer will therefore solve the following error:
where is the error, is the sigmoid layer and is the ith layer of the network. Once we are done, we can cook, train and test as usual:
learning_rates = (0.05, 0.01, 0.001)
net.cook( optimizer = 'polyak-rms',
objective_layer = 'nll',
datastream = 'mnist',
classifier = 'softmax',
)
net.train( epochs = (20, 20),
validate_after_epochs = 2,
training_accuracy = True,
learning_rates = learning_rates,
show_progress = True,
early_terminate = True)
The learning_rate
, supplied here is a tuple. The first indicates a annealing of a linear rate,
the second is the initial learning rate of the first era, and the third value is the leanring rate
of the second era. Accordingly, epochs
takes in a tuple with number of epochs for each era.
This time, let us not let it run the forty epochs, let us cancel in the middle after some epochs
by hitting ^c. Once it stops lets immediately test and demonstrate that the net
retains the
parameters as updated as possible. Once done, lets run net.test()
.
Some new arguments are introduced here and they are for the most part easy to understand in context.
epoch
represents a tuple
which is the number of epochs of training and number of epochs of
fine tuning epochs after that. There could be several of these stages of finer tuning. Yann uses the
term ‘era’ to represent each set of epochs running with one learning rate. show_progress
will
print a progress bar for each epoch. validate_after_epochs
will perform
validation after such many epochs on a different validation dataset. The full code for this tutorial
with additional commentary can be found in the file pantry.tutorials.mlp.py
. If you have
toolbox cloned or downloaded or just the tutorials downloaded, Run the code as,
from pantry.tutorials.mlp import mlp
mlp(dataset = 'some dataset created')
or simply,
python pantry/tutorials/mlp.py
from the toolbox root or path added to toolbox. The __init__
program has all the required
tools to create or load an already created dataset. Optionally as command line argument you can
provide the location to the dataset.
Autoencoder Network.¶
By virture of being here, it is assumed that you have gone through the Quick Start.
Todo
Code is done, but text needs to be written in.
The full code for this tutorial with additional commentary can be found in the file
pantry.tutorials.autoencoder.py
. If you have toolbox cloned or downloaded or just the tutorials
downloaded, Run the code as,
Todo
- Need a validation and testing thats better than just measuring rmse. Can’t find something great.
Notes
- This code contains two methods.
- A shallow autoencoder with just one layer.
- A Convolutional-Deconvolutional autoencoder that uses a deconv layer.
Both these methods are setup for MNIST dataset.
-
pantry.tutorials.autoencoder.
convolutional_autoencoder
(dataset=None, verbose=1)[source]¶ This function is a demo example of a deep convolutional autoencoder. This is an example code. You should study this code rather than merely run it. This is also an example for using the deconvolutional layer or the transposed fractional stride convolutional layers.
Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
-
pantry.tutorials.autoencoder.
shallow_autoencoder
(dataset=None, verbose=1)[source]¶ This function is a demo example of a sparse shallow autoencoder. This is an example code. You should study this code rather than merely run it.
Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
Convolutional Neural Network.¶
By virture of being here, it is assumed that you have gone through the Quick Start.
Building a convolutional neural network is just as similar as an MLNN. The convolutional-pooling layer or convpool layer could be added using the following statement:
net.add_layer ( type = "conv_pool",
origin = "input",
id = "conv_pool_1",
num_neurons = 40,
filter_size = (5,5),
pool_size = (2,2),
activation = ('maxout', 'maxout', 2),
batch_norm = True,
regularize = True,
verbose = verbose
)
Here the layer has 40 filters each 5X5 followed by batch normalization followed by a maxpooling of 2X2 all with stride 1. The activation used is maxout with maxout by 2. A simpler relu layer could be added thus,
net.add_layer ( type = "conv_pool",
origin = "input",
id = "conv_pool_1",
num_neurons = 40,
filter_size = (5,5),
pool_size = (2,2),
activation = 'relu',
verbose = verbose
)
Refer to the APIs for more details on the convpool layer.
It is often useful to visualize the filters learnt in a CNN, so we introduce the visualizer module
here along with the CNN tutorial. The visualizer can be setup using the add_module
method of
net
object.
net.add_module ( type = 'visualizer',
params = visualizer_params,
verbose = verbose
)
where the visualizer_params
is a dictionary of the following format.
visualizer_params = {
"root" : 'lenet5',
"frequency" : 1,
"sample_size": 144,
"rgb_filters": True,
"debug_functions" : False,
"debug_layers": False,
"id" : 'main'
}
root
is the location where the visualizations are saved, frequency
is the number of epochs
for which visualizations are saved down, sample_size
number of images are saved each time.
rgb_filters
make the filters save in color. Along with the activities of each layer for the
exact same images as the data itself, the filters of neural network are also saved down.
For more options of parameters on visualizer refer to the visualizer documentation .
The full code for this tutorial with additional commentary can be found in the file
pantry.tutorials.lenet.py
. This tutorial runs a CNN for the lenet dataset.
If you have toolbox cloned or downloaded or just the tutorials downloaded, Run the code using,
Notes
- This code contains three methods.
- A modern reincarnation of LeNet5 for MNIST.
- The same Lenet with batchnorms
- 2.a. Batchnorm before activations. 2.b. Batchnorm after activations.
All these methods are setup for MNIST dataset.
Todo
Add detailed comments.
-
pantry.tutorials.lenet.
lenet5
(dataset=None, verbose=1)[source]¶ This function is a demo example of lenet5 from the infamous paper by Yann LeCun. This is an example code. You should study this code rather than merely run it.
Warning
This is not the exact implementation but a modern re-incarnation.
Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
-
pantry.tutorials.lenet.
lenet_maxout_batchnorm_after_activation
(dataset=None, verbose=1)[source]¶ This is a version with nesterov momentum and rmsprop instead of the typical sgd. This also has maxout activations for convolutional layers, dropouts on the last convolutional layer and the other dropout layers and this also applies batch norm to all the layers. The difference though is that we use the
batch_norm
layer to apply batch norm that applies batch norm after the activation fo the previous layer. So we just spice things up and add a bit of steroids tolenet5()
. This also introduces a visualizer module usage.Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
-
pantry.tutorials.lenet.
lenet_maxout_batchnorm_before_activation
(dataset=None, verbose=1)[source]¶ This is a version with nesterov momentum and rmsprop instead of the typical sgd. This also has maxout activations for convolutional layers, dropouts on the last convolutional layer and the other dropout layers and this also applies batch norm to all the layers. The batch norm is applied by using the
batch_norm = True
parameters in all layers. This batch norm is applied before activation as is used in the original version of the paper. So we just spice things up and add a bit of steroids tolenet5()
. This also introduces a visualizer module usage.Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
Generative Adversarial Networks.¶
By virture of being here, it is assumed that you have gone through the Quick Start.
Todo
Code is done, but text needs to be written in. This code/tutorial will also explain how the network class is setup because to implement a GAN, we need to inherit the network class out and re-write some of the methods.
The full code for this tutorial with additional commentary can be found in the file
pantry.tutorials.gan.py
. If you have toolbox cloned or downloaded or just the tutorials
downloaded, Run the code as,
Referenced from
Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Generative adversarial nets.” In Advances in Neural Information
Processing Systems, pp. 2672-2680. 2014.
Notes
This file contains several GAN implementations:
- Shallow GAN setup for MNIST
- Shallow Wasserstein GAN setup for MNIST*
- Deep GAN (Ian Goodfellow’s original implementation) setup for MNIST
- DCGAN (Chintala et al.) setup for CIFAR 10
- LS - DCGAN setup for CIFAR 10
Todos:
- Convert the DCGANs for CELEBA.
- WGAN doesn’t work properly because of clipping.
- Check that DCGANs strides are properly setup.
-
pantry.tutorials.gan.
deep_deconvolutional_gan
(dataset, regularize=True, batch_norm=True, dropout_rate=0.5, verbose=1)[source]¶ This function is a demo example of a generative adversarial network. This is an example code. You should study this code rather than merely run it. This method uses a few deconvolutional layers. This method is setup to produce images of size 32X32.
Parameters: - dataset – Supply a dataset.
- regularize –
True
(default) supplied to layer arguments - batch_norm –
True
(default) supplied to layer arguments - dropout_rate –
None
(default) supplied to layer arguments - verbose – Similar to the rest of the dataset.
Returns: A Network object.
Return type: net
Notes
This method is setup for Cifar 10.
-
pantry.tutorials.gan.
deep_deconvolutional_lsgan
(dataset, regularize=True, batch_norm=True, dropout_rate=0.5, verbose=1)[source]¶ This function is a demo example of a generative adversarial network. This is an example code. You should study this code rather than merely run it. This method uses a few deconvolutional layers as was used in the DCGAN paper. This method is setup to produce images of size 32X32.
Parameters: - dataset – Supply a dataset.
- regularize –
True
(default) supplied to layer arguments - batch_norm –
True
(default) supplied to layer arguments - dropout_rate –
None
(default) supplied to layer arguments - verbose – Similar to the rest of the dataset.
Returns: A Network object.
Return type: net
Notes
This method is setupfor SVHN / CIFAR10. This is an implementation of th least squares GAN with a = 0, b = 1 and c= 1 (equation 9) [1] Least Squares Generative Adversarial Networks, Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang
-
pantry.tutorials.gan.
deep_gan_mnist
(dataset, verbose=1)[source]¶ This function is a demo example of a generative adversarial network. This is an example code. You should study this code rather than merely run it.
Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
Returns: A Network object.
Return type: net
Notes
This network here mimics Ian Goodfellow’s original code and implementation for MNIST adapted from his source code: https://github.com/goodfeli/adversarial/blob/master/mnist.yaml .It might not be a perfect replicaiton, but I tried as best as I could.
This method is setup for MNIST
-
pantry.tutorials.gan.
shallow_gan_mnist
(dataset=None, verbose=1)[source]¶ This function is a demo example of a generative adversarial network. This is an example code. You should study this code rather than merely run it.
Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
Notes
This method is setup for MNIST.
-
pantry.tutorials.gan.
shallow_wgan_mnist
(dataset=None, verbose=1)[source]¶ This function is a demo example of a Wasserstein generative adversarial network. This is an example code. You should study this code rather than merely run it.
Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
Notes
This method is setup for MNIST. Everything in this code is the same as the shallow GAN class except for the loss functions.
Todo
This is not verified. There is some trouble in weight clipping.
Batch Normalization.¶
Batch normalization has become one important operation in faster and stable learning of neural networks. In batch norm we do the following:
The is the input (and the output) of this operation, and are the mean and the variance of the minibatch of supplied. and are learnt using back propagation. This will also store a running mean and a running variance, which is used during inference time.
By default batch normalization can be performed on convolution and dot product layers using
the argument batch_norm = True
supplied to the yann.network.add_layer
method. This
will apply the batch normalization before the activation and after the core layer operation.
While this is the technique that was described in the original batch normalization paper[1]. Some
modern networks such as the Residual network [2],[3] use a re-orderd version of layer operations
that require the batch norm to be applied post-activation. This is particularly used when using
ReLU or Maxout networks[4][5]. Therefore we also provide a layer type batch_norm
, that could
create a layer that simply does batch normalization on the input supplied. These layers could be
used to create a post-activation batch normalization.
This tutorial demonstrates the use of both these techniques using the same architecutre of networks
used in the Convolutional Neural Network. tutorial. The codes for these can be found in the following module methods
in pantry.tutorials
.
References
[1] | Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” arXiv preprint arXiv:1502.03167 (2015). |
[2] | He, Kaiming, et al. “Identity mappings in deep residual networks.” European Conference on Computer Vision. Springer International Publishing, 2016. |
[3] | He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. |
[4] | Nair, Vinod, and Geoffrey E. Hinton. “Rectified linear units improve restricted boltzmann machines.” Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010. |
[5] | Goodfellow, Ian J., et al. “Maxout networks.” arXiv preprint arXiv:1302.4389 (2013). |
Notes
- This code contains three methods.
- A modern reincarnation of LeNet5 for MNIST.
- The same Lenet with batchnorms
- 2.a. Batchnorm before activations. 2.b. Batchnorm after activations.
All these methods are setup for MNIST dataset.
Todo
Add detailed comments.
-
pantry.tutorials.lenet.
lenet5
(dataset=None, verbose=1)[source]¶ This function is a demo example of lenet5 from the infamous paper by Yann LeCun. This is an example code. You should study this code rather than merely run it.
Warning
This is not the exact implementation but a modern re-incarnation.
Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
-
pantry.tutorials.lenet.
lenet_maxout_batchnorm_after_activation
(dataset=None, verbose=1)[source]¶ This is a version with nesterov momentum and rmsprop instead of the typical sgd. This also has maxout activations for convolutional layers, dropouts on the last convolutional layer and the other dropout layers and this also applies batch norm to all the layers. The difference though is that we use the
batch_norm
layer to apply batch norm that applies batch norm after the activation fo the previous layer. So we just spice things up and add a bit of steroids tolenet5()
. This also introduces a visualizer module usage.Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
-
pantry.tutorials.lenet.
lenet_maxout_batchnorm_before_activation
(dataset=None, verbose=1)[source]¶ This is a version with nesterov momentum and rmsprop instead of the typical sgd. This also has maxout activations for convolutional layers, dropouts on the last convolutional layer and the other dropout layers and this also applies batch norm to all the layers. The batch norm is applied by using the
batch_norm = True
parameters in all layers. This batch norm is applied before activation as is used in the original version of the paper. So we just spice things up and add a bit of steroids tolenet5()
. This also introduces a visualizer module usage.Parameters: - dataset – Supply a dataset.
- verbose – Similar to the rest of the dataset.
Cooking a matlab dataset for Yann.¶
By virture of being here, it is assumed that you have gone through the Quick Start.
This tutorial will help you convert a dataset from matlab workspace to yann. To begin let us
acquire Google’s Street View House Numbers dataset in Matlab [1]. Download from the url three
.mat files: test_32x32.mat, train_32x32.mat and extra_32x32.mat. Once downloaded we need to
divide this mat dump of data into training, testing and validation minibatches appropriately as
used by yann. This can be accomplished by the steps outlined in the code
yann\pantry\matlab\make_svhn.m
. This will create data with 500 samples per mini batch with
56 training batches, 42 testing batches and 28 validation batches.
Once the mat files are setup appropriately, they are ready for yann to load and convert them into
yann data. In case of data that is not form svhn, you can open one of the ‘batch’ files in matlab
to understand how the data is spread. Typically, the x
variable is vectorized images, in this
case 500X3072 (500 images per batch, 32*32*3 pixels per image). y
is an integer vector labels
going from 0-10 in this case.
References
[1] | Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng Reading Digits in Natural Images with Unsupervised Feature Learning NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. |
To convert the code into yann, we can use the setup_dataset
module at yann.utils.dataset.py
file. Simply call the initializer as,
dataset = setup_dataset(dataset_init_args = data_params,
save_directory = save_directory,
preprocess_init_args = preprocess_params,
verbose = 3 )
where, data_params
contains information about the dataset thusly,
data_params = {
"source" : 'matlab',
# "name" : 'yann_svhn', # some name.
"location" : location, # some location to load from.
"height" : 32,
"width" : 32,
"channels" : 3,
"batches2test" : 42,
"batches2train" : 56,
"batches2validate" : 28,
"mini_batch_size" : 500 }
and the preprocess_params
contains information on how to process the images thusly,
preprocess_params = {
"normalize" : True,
"ZCA" : False,
"grayscale" : False,
"zero_mean" : False,
}
save_directory
is simply a location to save the yann dataset. Customarialy, it is
save_directory = '_datasets'
The full code for this tutorial with additional commentary can be found in the file
pantry.tutorials.mat2yann.py
.
If you have toolbox cloned or downloaded or just the tutorials downloaded, Run the code using,
-
pantry.tutorials.mat2yann.
cook_svhn_normalized
(location, verbose=1, **kwargs)[source]¶ This method demonstrates how to cook a dataset for yann from matlab. Refer to the
pantry/matlab/setup_svhn.m
file first to setup the dataset and make it ready for use with yann.Parameters: - location – provide the location where the dataset is created and stored. Refer to prepare_svhn.m file to understand how to prepare a dataset.
- save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
.
Notes
By default, this will create a dataset that is not mean-subtracted.
-
class
yann.utils.dataset.
setup_dataset
(dataset_init_args, save_directory='_datasets', verbose=1, **kwargs)[source]¶ The setup_dataset class is used to create and assemble datasets that are friendly to the Yann toolbox.
Todo
images
option for thesource
.skdata pascal
isn’t workingimagenet
dataset andcoco
needs to be setup.Parameters: - dataset_init_args –
is a dictonary of the form:
data_init_args = { "source" : <where to get the dataset from> 'pkl' : A theano tutorial style 'pkl' file. 'skdata' : Download and setup from skdata 'matlab' : Data is created and is being used from Matlab 'images-only' : Data is created from a directory of images. This will be an unsupervised dataset with no labels. "name" : necessary only for skdata supports * ``'mnist'`` * ``'mnist_noise1'`` * ``'mnist_noise2'`` * ``'mnist_noise3'`` * ``'mnist_noise4'`` * ``'mnist_noise5'`` * ``'mnist_noise6'`` * ``'mnist_bg_images'`` * ``'mnist_bg_rand'`` * ``'mnist_rotated'`` * ``'mnist_rotated_bg'``. * ``'cifar10'`` * ``'caltech101'`` * ``'caltech256'`` Refer to original paper by Hugo Larochelle [1] for these dataset details. "location" : necessary for 'pkl' and 'matlab' and 'images-only' "mini_batch_size" : 500, # some batch size "mini_batches_per_batch" : (100, 20, 20), # trianing, testing, validation "batches2train" : 1, # number of files will be created. "batches2test" : 1, "batches2validate" : 1, "height" : 28, # After pre-processing "width" : 28, "channels" : 1 , # color (3) or grayscale (1) ... }
- preprocess_init_args –
provide preprocessing arguments. This is a dictionary:
args = { "normalize" : <bool> True for normalize across batches "GCN" : True for global contrast normalization "ZCA" : True, kind of like a PCA representation (not fully tested) "grayscale" : Convert the image to grayscale }
- save_directory – <string> a location where the dataset is going to be saved.
[2] Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. InProceedings of the 24th international conference on Machine learning 2007 Jun 20 (pp. 473-480). ACM. Notes
Yann toolbox takes datasets in a
.pkl
format. The dataset requires a directory structure such as the following:location/_dataset_XXXXX |_ data_params.pkl |_ train |_ batch_0.pkl |_ batch_1.pkl . . . |_ valid |_ batch_0.pkl |_ batch_1.pkl . . . |_ test |_ batch_0.pkl |_ batch_1.pkl . . .
The location id (
XXXXX
) is generated by this class file. The five digits that are produced is the unique id of the dataset.The file
data_params.pkl
contains one variabledataset_args
used by datastream.- dataset_init_args –
Todo
- Do tutorials for the following:
- Loading pre-trained VGG-19 net
- AlexNet
- GoogleNet
- ResNet
Structure of the Yann network¶
The core of the yann toolbox and its operations are built around the yann.network.network
class, which is present in the file yann/network.py
. The above figure shows the organization of
the yann.network.network
class. The add_xxxx()
methods add either a layer or module as
nomenclature states. The network class can hold various layers and modules in various connections and
architecture that are added using the add_
methods.
Verbose¶
Throughout the toolbox, various methods take an argument called verbose
as input. verbose
is
by default always 2
. verbose = 1
implies a silent run and therefore the code doesn’t print
anything unless absolutely needed. verbose=2
prints quite the standard amount of information and
verbose==3
, which is friendly when being used for debugging prints annoyingly too much
information.
Initializing a network class¶
A network
pbject can quite simply be initialized by calling
from yann.network import network
net = network()
Each layer takes in as argument While prepping the network for learning, we can (or may) need only certain modules and layers. The process of preparing the network by selecting and building the training, testing and validation parts of network is called cooking.
The above figure shows a cooked network. The objects that are in gray and are shaded are uncooked parts of the network. Once cooked, the network is ready for training and testing all by using other methods within the network. The network class also has several properties such as layers, which is a dictionary of the layers that are added to it and params, which is a dictionary of all the parameters. All layers and modules contain a property called id through which they are referred.
Quick Start¶
The easiest way to get going with Yann is to follow this quick start guide. If you are not satisfied and want a more detailed introduction to the toolbox, you may refer to the Tutorials and the Structure of the Yann network. This tutorial was also presented in CSE591 at ASU and the video of the presentation is available. A more detailed Jupyter Notebook version of this tutorial is available here.
To install in a quick fashion without much dependencies run the follwing command:
pip install git+git://github.com/ragavvenkatesan/yann.git
If there was an error with installing skdata
, you might want to install numpy
and scipy
independently first and then run the above command. Note that this installer, does not enable a lot
of options of the toolbox for which you need to go through the complete install described at the
Installation Guide page.
Verify that the installation of theano is indeed version 0.9 or greater by doing the following in a python shell
import theano
theano.__version__
If the version was not 0.9, you can install 0.9 by doing the following:
pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
The start and the end of Yann toolbox is the network
module. The yann.network
.
network
object is where all the magic happens. Start by importing network
and creating a
network
object in a python shell.
from yann.network import network
net = network()
Voila! We have thus created a new network. The network doesn’t have any layers or modules in it.
It be seen verified by probing into net.layers
property of the net
object.
net.layers
This will produce an output which is essentially an empty dictionary {}
. Let’s add some layers!
The toolbox comes with a port to skdata the MNIST dataset of handwritten characters can be built using this port.
To cook a mnist dataset for yann run the following code:
from yann.special.datasets import cook_mnist
cook_mnist()
Running this code will print a statement to the following effect >>Dataset xxxxx is created.
The five digits marked xxxxx
in the statement is the codeword for the dataset. The actual
dataset is located now at _datasets/_dataset_xxxxx/
from the directory from where this code
was called. Mnist dataset is created and stored at this dataset in a format that is configured for
yann to work with. Refer to the Tutorials on how to convert your own dataset for yann.
The first layer that we need to add to our network now is an input
layer. Every ``input``layer
requries a dataset to be associated with it. Let us create this layer.
dataset_params = { "dataset": "_datasets/_dataset_xxxxx", "n_classes" : 10 }
net.add_layer(type = "input", dataset_init_args = dataset_params)
This piece of code creates and adds a new datastream
module to the net
and wires up the
newly added input
layer with this datastream
. Confirm this by checking net.datastream
.
Let us now build a classifier
layer. The default classifier that yann is
setup with is the logistic regression classifier. Refer to Toolbox Documentation or Tutorials for other
types of layers. Let us create a this classifier
layer for now.
net.add_layer(type = "classifier" , num_classes = 10)
net.add_layer(type = "objective")
The layer objective
creates the loss function from the classifier that can be used as a learning
metric. It also provides a scope for other modules such as the optimizer
module. Refer
Structure of the Yann network and Toolbox Documentation for more details on modules. Now that our network is created and
constructed we can see that the net
objects have layers
populated.
net.layers
>>{'1': <yann.network.layers.classifier_layer object at 0x7eff9a7d0050>, '0':
<yann.network.layers.input_layer object at 0x7effa410d6d0>, '2':
<yann.network.layers.objective_layer object at 0x7eff9a71b210>}
The keys of the dictionary such as '1'
, '0'
and '2'
are the id
of the layer. We
could have created a layer with a custom id by supplying an id
argument to the add_layer
method. To get a better idea of how the network looks like, you can use the pretty_print
mehtod
in yann.
net.pretty_print()
Now our network is finally ready to be trained. Before training, we need to build an
optimizer
and other tools, but for now let us use the default ones. Once all of this is done,
yann requires that the network be ‘cooked’. For more details on cooking refer
Structure of the Yann network. For now let us imagine that cooking a network will finalize the wiring,
architecture, cache and prepare the first batch of data, prepare the modules and in general
prepare the network for training using back propagation.
net.cook()
Cooking would take a few seconds and might print what it is doing along the way. Once cooked, we may
notice for instance that the network has a optimizer
module.
net.optimizer
>>{'main': <yann.network.modules.optimizer object at 0x7eff9a7c1b10>}
To train the model that we have just cooked, we can use the train
function that becomes
available to us once the network is cooked.
net.train()
This will print a progress for each epoch and will show validation accuracy after each epoch on a validation set that is independent from the training set. By default the training might run for 40 epochs: 20 on a higher learning rate and 20 more on a fine tuning learning rate.
Every layer also has an layer.output
object. The output
can be probed by using the
layer_activity
method as long as it is directly or in-directly associated with a
datastream
module through an input
layer and the network was cooked.
Let us observe the activity of the input layer for trial. Once trained we can observe this output.
The layer activity will just be a numpy
array of numbers, so let us print its shape instead.
net.layer_activity(id = '0').shape
net.layers['0'].output_shape
The second line of code will verify the output we produced in the first line. An interesting layer
output is the output of the objective
layer, which will give us the current
negative log likelihood of the network, the one that we are trying to minimize.
net.layer_activity(id = '2')
>>array(0.3926551938056946, dtype=float32)
Once we are done training, we can run the network feedforward on the testing set to produce a generalization performance result.
net.test()
Congratualations, you now know how to use the yann toolbox successfully. A full-fledge code of the logistic regression that we implemented here can be found here . That piece of code also has in-commentary that discusses briefly other options that could be supplied to some of the function calls we made here that explain the processes better.
Hope you liked this quick start guide to the Yann toolbox and have fun!