ChainerCV

ChainerCV is a deep learning based computer vision library built on top of Chainer.

Installation Guide

Pip

You can install ChainerCV using pip.

pip install -U numpy
pip install chainercv

Anaconda

Build instruction using Anaconda is as follows.

# For python 3
# wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh

bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
conda config --set always_yes yes --set changeps1 no
conda update -q conda

# Download ChainerCV and go to the root directory of ChainerCV
git clone https://github.com/chainer/chainercv
cd chainercv
conda env create -f environment.yml
source activate chainercv

# Install ChainerCV
pip install -e .

# Try our demos at examples/* !

ChainerCV Tutorial

Object Detection Tutorial

This tutorial will walk you through the features related to object detection that ChainerCV supports. We assume that readers have a basic understanding of Chainer framework (e.g. understand chainer.Link). For users new to Chainer, please first read Introduction to Chainer.

In ChainerCV, we define the object detection task as a problem of, given an image, bounding box based localization and categorization of objects. ChainerCV supports the task by providing the following features:

  • Visualization
  • BboxDataset
  • Detection Link
  • DetectionEvaluator
  • Training script for various detection models

Here is a short example that conducts inference and visualizes output. Please download an image from a link below, and name it as sample.jpg. https://cloud.githubusercontent.com/assets/2062128/26187667/9cb236da-3bd5-11e7-8bcf-7dbd4302e2dc.jpg

# In the rest of the tutorial, we assume that the `plt`
# is imported before every code snippet.
import matplotlib.pyplot as plt

from chainercv.datasets import voc_bbox_label_names
from chainercv.links import SSD300
from chainercv.utils import read_image
from chainercv.visualizations import vis_bbox

# Read an RGB image and return it in CHW format.
img = read_image('sample.jpg')
model = SSD300(pretrained_model='voc0712')
bboxes, labels, scores = model.predict([img])
vis_bbox(img, bboxes[0], labels[0], scores[0],
         label_names=voc_bbox_label_names)
plt.show()
_images/detection_tutorial_link_simple.png

Bounding boxes in ChainerCV

Bounding boxes in an image are represented as a two-dimensional array of shape \((R, 4)\), where \(R\) is the number of bounding boxes and the second axis corresponds to the coordinates of bounding boxes. The coordinates are ordered in the array by (y_min, x_min, y_max, x_max), where (y_min, x_min) and (y_max, x_max) are the (y, x) coordinates of the top left and the bottom right vertices. Notice that ChainerCV orders coordinates in yx order, which is the opposite of the convention used by other libraries such as OpenCV. This convention is adopted because it is more consistent with the memory order of an image that follows row-column order. Also, the dtype of bounding box array is numpy.float32.

Here is an example with a simple toy data.

from chainercv.visualizations import vis_bbox
import numpy as np

img = np.zeros((3, 224, 224), dtype=np.float32)
# We call a variable/array of bounding boxes as `bbox` throughout the library
bbox = np.array([[10, 10, 20, 40], [150, 150, 200, 200]], dtype=np.float32)

vis_bbox(img, bbox)
plt.show()
_images/detection_tutorial_simple_bbox.png

In this example, two bounding boxes are displayed on top of a black image. vis_bbox() is a utility function that visualizes bounding boxes and an image together.

Bounding Box Dataset

ChainerCV supports dataset loaders, which can be used to easily index examples with list-like interfaces. Dataset classes whose names end with BboxDataset contain annotations of where objects locate in an image and which categories they are assigned to. These datasets can be indexed to return a tuple of an image, bounding boxes and labels. The labels are stored in an np.int32 array of shape \((R,)\). Each element corresponds to a label of an object in the corresponding bounding box.

A mapping between an integer label and a category differs between datasets. This mapping can be obtained from objects whose names end with label_names, such as voc_bbox_label_names. These mappings become helpful when bounding boxes need to be visualized with label names. In the next example, the interface of BboxDataset and the functionality of vis_bbox() to visualize label names are illustrated.

from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
from chainercv.visualizations import vis_bbox

dataset = VOCBboxDataset(year='2012')
img, bbox, label = dataset[0]
print(bbox.shape)  # (2, 4)
print(label.shape)  # (2,)
vis_bbox(img, bbox, label, label_names=voc_bbox_label_names)
plt.show()
_images/detection_tutorial_bbox_dataset_vis.png

Note that the example downloads VOC 2012 dataset at runtime when it is used for the first time on the machine.

Detection Evaluator

ChainerCV provides functionalities that make evaluating detection links easy. They are provided at two levels: evaluator extensions and evaluation functions.

Evaluator extensions such as DetectionVOCEvaluator inherit from Evaluator, and have similar interface. They are initialized by taking an iterator and a network that carries out prediction with method predict(). When this class is called (i.e. __call__() of DetectionVOCEvaluator), several actions are taken. First, it iterates over a dataset based on an iterator. Second, the network makes prediction using the images collected from the dataset. Last, an evaluation function is called with the ground truth annotations and the prediction results.

In contrast to evaluators that hide details, evaluation functions such as eval_detection_voc() are provided for those who need a finer level of control. These functions take the ground truth annotations and prediction results as arguments and return measured performance.

Here is a simple example that uses a detection evaluator.

from chainer.iterators import SerialIterator
from chainer.datasets import SubDataset
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
from chainercv.extensions import DetectionVOCEvaluator
from chainercv.links import SSD300

# Only use subset of dataset so that evaluation finishes quickly.
dataset = VOCBboxDataset(year='2007', split='test')
dataset = dataset[:6]
it = SerialIterator(dataset, 2, repeat=False, shuffle=False)
model = SSD300(pretrained_model='voc0712')
evaluator = DetectionVOCEvaluator(it, model,
                                  label_names=voc_bbox_label_names)
# result is a dictionary of evaluation scores. Print it and check it.
result = evaluator()

References

[Ren15]Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
[Liu16]Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.

Sliceable Dataset

This tutorial will walk you through the features related to sliceable dataset. We assume that readers have a basic understanding of Chainer dataset (e.g. understand chainer.dataset.DatasetMixin).

In ChainerCV, we introduce sliceable feature to datasets. Sliceable datasets support slice() that returns a view of the dataset.

This example that shows the basic usage.

# VOCBboxDataset supports sliceable feature
from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# keys returns the names of data
print(dataset.keys)  # ('img', 'bbox', 'label')
# we can get an example by []
img, bbox, label = dataset[0]

# get a view of the first 100 examples
view = dataset.slice[:100]
print(len(view))  # 100

# get a view of image and label
view = dataset.slice[:, ('img', 'label')]
# the view also supports sliceable, so that we can call keys
print(view.keys)  # ('img', 'label')
# we can get an example by []
img, label = view[0]

Motivation

slice() returns a view of the dataset without conducting data loading, where DatasetMixin.__getitem__() conducts get_example() for all required examples. Users can write efficient code by this view.

This example counts the number of images that contain dogs. With the sliceable feature, we can access the label information without loading images from disk.. Therefore, the first case becomes faster.

import time

from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names

dataset = VOCBboxDataset()
dog_lb = voc_bbox_label_names.index('dog')

# with slice
t = time.time()
count = 0
# get a view of label
view = dataset.slice[:, 'label']
for i in range(len(view)):
    # we can focus on label
    label = view[i]
    if dog_lb in label:
        count += 1
print('w/ slice: {} secs'.format(time.time() - t))
print('{} images contain dogs'.format(count))
print()

# without slice
t = time.time()
count = 0
for i in range(len(dataset)):
    # img and bbox are loaded but not needed
    img, bbox, label = dataset[i]
    if dog_lb in label:
        count += 1
print('w/o slice: {} secs'.format(time.time() - t))
print('{} images contain dogs'.format(count))
print()

Usage: slice along with the axis of examples

slice() takes indices of examples as its first argument.

from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# the view of the first 100 examples
view = dataset.slice[:100]

# the view of the last 100 examples
view = dataset.slice[-100:]

# the view of the 3rd, 5th, and 7th examples
view = dataset.slice[3:8:2]

# the view of the 3rd, 1st, and 4th examples
view = dataset.slice[[3, 1, 4]]

Also, it can take a list of booleans as its first argument. Note that the length of the list should be the same as len(dataset).

from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# make booleans
bboxes = dataset.slice[:, 'bbox']
booleans = [len(bbox) >= 3 for bbox in bboxes]

# a collection of samples that contain at least three bounding boxes
view = dataset.slice[booleans]

Usage: slice along with the axis of data

slice() takes names or indices of data as its second argument. keys returns all available names.

from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# the view of image
# note that : of the first argument means all examples
view = dataset.slice[:, 'img']
print(view.keys)  # 'img'
img = view[0]

# the view of image and label
view = dataset.slice[:, ('img', 'label')]
print(view.keys)  # ('img', 'label')
img, label = view[0]

# the view of image (returns a tuple)
view = dataset.slice[:, ('img',)]
print(view.keys)  # ('img',)
img, = view[0]

# use an index instead of a name
view = dataset.slice[:, 1]
print(view.keys)  # 'bbox'
bbox = view[0]

# mixture of names and indices
view = dataset.slice[:, (1, 'label')]
print(view.keys)  # ('bbox', 'label')
bbox, label = view[0]

# use booleans
# note that the number of booleans should be the same as len(dataset.keys)
view = dataset.slice[:, (True, True, False)]
print(view.keys)  # ('img', 'bbox')
img, bbox = view[0]

Usage: slice along with both axes

from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# the view of the labels of the first 100 examples
view = dataset.slice[:100, 'label']

Concatenate and transform

ChainerCV provides ConcatenatedDataset and TransformDataset. The difference from chainer.datasets.ConcatenatedDataset and chainer.datasets.TransformDataset is that they take sliceable dataset(s) and return a sliceable dataset.

from chainercv.chainer_experimental.datasets.sliceable import ConcatenatedDataset
from chainercv.chainer_experimental.datasets.sliceable import TransformDataset
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names

dataset_07 = VOCBboxDataset(year='2007')
print('07:', dataset_07.keys, len(dataset_07))  # 07: ('img', 'bbox', 'label') 2501

dataset_12 = VOCBboxDataset(year='2012')
print('12:', dataset_12.keys, len(dataset_12))  # 12: ('img', 'bbox', 'label') 5717

# concatenate
dataset_0712 = ConcatenatedDataset(dataset_07, dataset_12)
print('0712:', dataset_0712.keys, len(dataset_0712))  # 0712: ('img', 'bbox', 'label') 8218

# transform
def transform(in_data):
    img, bbox, label = in_data

    dog_lb = voc_bbox_label_names.index('dog')
    bbox_dog = bbox[label == dog_lb]

    return img, bbox_dog

# we need to specify the names of data that the transform function returns
dataset_0712_dog = TransformDataset(dataset_0712, ('img', 'bbox_dog'), transform)
print('0712_dog:', dataset_0712_dog.keys, len(dataset_0712_dog))  # 0712_dog: ('img', 'bbox_dog') 8218

Make your own dataset

ChainerCV provides GetterDataset to construct a new sliceable dataset.

This example implements a sliceable bounding box dataset.

import numpy as np

from chainercv.chainer_experimental.datasets.sliceable import GetterDataset
from chainercv.utils import generate_random_bbox

class SampleBboxDataset(GetterDataset):
    def __init__(self):
        super(SampleBboxDataset, self).__init__()

        # register getter method for image
        self.add_getter('img', self.get_image)
        # register getter method for bbox and label
        self.add_getter(('bbox', 'label'), self.get_annotation)

    def __len__(self):
        return 20

    def get_image(self, i):
        print('get_image({})'.format(i))
        # generate dummy image
        img = np.random.uniform(0, 255, size=(3, 224, 224)).astype(np.float32)
        return img

    def get_annotation(self, i):
        print('get_annotation({})'.format(i))
        # generate dummy annotations
        bbox = generate_random_bbox(10, (224, 224), 10, 224)
        label = np.random.randint(0, 9, size=10).astype(np.int32)
        return bbox, label

dataset = SampleBboxDataset()
img, bbox, label = dataset[0]  # get_image(0) and get_annotation(0)

view = dataset.slice[:, 'label']
label = view[1]  # get_annotation(1)

If you have arrays of data, you can use TupleDataset.

import numpy as np

from chainercv.chainer_experimental.datasets.sliceable import TupleDataset
from chainercv.utils import generate_random_bbox

n = 20
imgs = np.random.uniform(0, 255, size=(n, 3, 224, 224)).astype(np.float32)
bboxes = [generate_random_bbox(10, (224, 224), 10, 224) for _ in range(n)]
labels = np.random.randint(0, 9, size=(n, 10)).astype(np.int32)

dataset = TupleDataset(('img', imgs), ('bbox', bboxes), ('label', labels))

print(dataset.keys)  # ('img', 'bbox', 'label')
view = dataset.slice[:, 'label']
label = view[1]

ChainerCV Reference Manual

Chainer Experimental

This module contains WIP modules of Chainer. After they are merged into chainer, these modules will be removed from ChainerCV.

Datasets

Sliceable
Sliceable

This module support sliceable feature. Please note that this module will be removed after Chainer implements sliceable feature.

ConcatenatedDataset
GetterDataset
TupleDataset
TransformDataset

Training

Extensions
Extensions
make_shift

Datasets

General datasets

DirectoryParsingLabelDataset
directory_parsing_label_names
MixUpSoftLabelDataset
SiameseDataset

ADE20K

ADE20KSemanticSegmentationDataset
ADE20KTestImageDataset

CamVid

CamVidDataset

Cityscapes

CityscapesSemanticSegmentationDataset
CityscapesTestImageDataset

CUB

CUBLabelDataset
CUBKeypointDataset

MS COCO

COCOBboxDataset
COCOInstanceSegmentationDataset
COCOSemanticSegmentationDataset

OnlineProducts

OnlineProductsDataset

PASCAL VOC

VOCBboxDataset
VOCInstanceSegmentationDataset
VOCSemanticSegmentationDataset

Semantic Boundaries Dataset

SBDInstanceSegmentationDataset

Evaluations

Detection COCO

eval_detection_coco

Detection VOC

eval_detection_voc
calc_detection_voc_ap
calc_detection_voc_prec_rec

Instance Segmentation COCO

eval_instance_segmentation_coco

Instance Segmentation VOC

eval_instance_segmentation_voc
calc_instance_segmentation_voc_prec_rec

Semantic Segmentation IoU

eval_semantic_segmentation
calc_semantic_segmentation_confusion
calc_semantic_segmentation_iou

Experimental

Extensions

Evaluator

DetectionCOCOEvaluator
DetectionVOCEvaluator
InstanceSegmentationCOCOEvaluator
InstanceSegmentationVOCEvaluator
SemanticSegmentationEvaluator

Visualization Report

DetectionVisReport

Functions

Spatial Pooling

ps_roi_average_align_2d
ps_roi_average_pooling_2d
ps_roi_max_align_2d
ps_roi_max_pooling_2d

Transforms

Image

center_crop
flip
pca_lighting
random_crop
random_expand
random_flip
random_rotate
random_sized_crop
resize
resize_contain
rotate
scale
ten_crop

Bounding Box

crop_bbox
flip_bbox
resize_bbox
rotate_bbox
translate_bbox

Point

flip_point
resize_point
translate_point

Visualizations

vis_bbox

vis_image

vis_instance_segmentation

vis_point

vis_semantic_segmentation

Utils

Bounding Box Utilities

bbox_iou
non_maximum_suppression

Download Utilities

cached_download
download_model
extractall

Image Utilities

read_image
read_label
tile_images
write_image

Iterator Utilities

apply_to_iterator
ProgressHook
unzip

Mask Utilities

mask_iou
mask_to_bbox
scale_mask

Testing Utilities

assert_is_bbox
assert_is_bbox_dataset
assert_is_image
assert_is_instance_segmentation_dataset
assert_is_label_dataset
assert_is_point
assert_is_point_dataset
assert_is_semantic_segmentation_dataset
generate_random_bbox

Naming Conventions

Here are the notations used.

  • \(B\) is the size of a batch.
  • \(H\) is the height of an image.
  • \(W\) is the width of an image.
  • \(C\) is the number of channels.
  • \(R\) is the total number of instances in an image.
  • \(L\) is the number of classes.

Data objects

Images

  • imgs: \((B, C, H, W)\) or \([(C, H, W)]\)
  • img: \((C, H, W)\)

Note

image is used for a name of a function or a class (e.g., chainercv.utils.write_image()).

Bounding boxes

  • bboxes: \((B, R, 4)\) or \([(R, 4)]\)
  • bbox: \((R, 4)\)
  • bb: \((4,)\)

Labels

name classification detection and instance segmentation semantic segmentation  
labels \((B,)\) \((B, R)\) or \([(R,)]\) \((B, H, W)\)  
label \(()\) \((R,)\) \((H, W)\)  
l r lb \(()\)

Scores and probabilities

score represents an unbounded confidence value. On the other hand, probability is bounded in [0, 1] and sums to 1.

name classification detection and instance segmentation semantic segmentation
scores or probs \((B, L)\) \((B, R, L)\) or \([(R, L)]\) \((B, L, H, W)\)
score or prob \((L,)\) \((R, L)\) \((L, H, W)\)
sc or pb \((L,)\)

Note

Even for objects that satisfy the definition of probability, they can be named as score.

Instance segmentations

  • masks: \((B, R, H, W)\) or \([(R, H, W)]\)
  • mask: \((R, H, W)\)
  • msk: \((H, W)\)

Attributing an additonal meaning to a basic data object

RoIs

  • rois: \((R', 4)\), which consists of bounding boxes for multiple images. Assuming that there are \(B\) images each containing \(R_i\) bounding boxes, the formula \(R' = \sum R_i\) is true.
  • roi_indices: An array of shape \((R',)\) that contains batch indices of images to which bounding boxes correspond.
  • roi: \((R, 4)\). This is RoIs for single image.

Attributes associated to RoIs

RoIs may have additional attributes, such as class scores and masks. These attributes are named by appending roi_ (e.g., scores-like object is named as roi_scores).

  • roi_xs: \((R',) + x_{shape}\)
  • roi_x: \((R,) + x_{shape}\)

In the case of scores with shape \((L,)\), roi_xs would have shape \((R', L)\).

Note

roi_nouns = roi_noun = noun when batchsize=1. Changing names interchangeably is fine.

Class-wise vs class-independent

cls_nouns is a multi-class version of nouns. For instance, cls_locs is \((B, R, L, 4)\) and locs is \((B, R, 4)\).

Note

cls_probs and probs can be used interchangeably in the case when there is no confusion.

Arbitrary input

x is a variable whose shape can be inferred from the context. It can be used only when there is no confusion on its shape. This is usually the case when naming an input to a neural network.

License

Source Code

The source code of ChainerCV is licensed under MIT-License.

Pretrained Models

Pretrained models provided by ChainerCV are benefited from the following resources. See the following resources for the terms of use of a model with weights pretrained by any of such resources.

model resource
ResNet50/101/152 (imagenet)
SEResNet50/101/152 (imagenet)
SEResNeXt50/101 (imagenet)
VGG16 (imagenet)
FasterRCNNVGG16 (imagenet)
FasterRCNNVGG16 (voc07/voc0712)
SSD300/SSD512 (imagenet)
SSD300/SSD512 (voc0712)
YOLOv2 (voc0712)
YOLOv3 (voc0712)
PSPNetResNet101 (cityscapes)
SegNetBasic (camvid)
FCISResNet101 (sbd)

Indices and tables