uarray

Warning

uarray is a developer tool, it is not meant to be used directly by end-users.

Warning

This document is meant to elicit discussion from the broader community and to help drive the direction that uarray goes towards. Examples provided here may not be immediately stable.

Note

This page describes the overall philosophy behind uarray. For usage instructions, see the uarray API documentation page. If you are interested in augmentation for NEP-22, please see the unumpy page.

uarray is a backend system for Python that allows you to separately define an API, along with backends that contain separate implementations of that API.

unumpy builds on top of uarray. It is an effort to specify the core NumPy API, and provide backends for the API.

What’s new in uarray?

uarray is, to our knowledge, the first backend-system for Python that’s generic enough to cater to the use-cases of many libraries, while at the same time, being library independent.

unumpy is the first approach to leverage uarray in order to build a generic backend system for (what we hope will be) the core NumPy API. It will be possible to create a backend object and use that to perform operations. In addition, it will be possible to change the used backend via a context manager.

Benefits for end-users

End-users can easily take their code written for one backend and use it on another backend with a simple switch (using a Python context manager). This can have any number of effects, depending on the functionality of the library. For example:

  • For Matplotlib, changing styles of plots or producing different windows or image formats.

  • For Tensorly, providing a different computation backend that can be distributed or target the GPU or sparse arrays.

  • For unumpy, it can do a similar thing: provide users with code they already wrote for numpy and easily switch to a different backend.

Benefits for library authors

To library authors, the benefits come in two forms: First, it allows them to build their libraries to be implementation independent. In code that builds itself on top of unumpy, it would be very easy to target the GPU, use sparse arrays or do any kind of distributed computing.

The second is to allow a way to separate the interface from implementation, and easily allow a way to switch an implementation.

Relation to the NumPy duck-array ecosystem

uarray is a backend/dispatch mechanism with a focus on array computing and the needs of the wider array community, by allowing a clean way to register an implementation for any Python object (functions, classes, class methods, properties, dtypes, …), it also provides an important building block for NEP-22. It is meant to address the shortcomings of NEP-18 and NEP-13; while still holding nothing in uarray itself that’s specific to array computing or the NumPy API.

Where to from here?

Choose the documentation page relevant to you:

End-user quickstart

Ideally, the only thing an end-user should have to do is set the backend and its options. Given a backend, you (as the end-user) can decide to do one of two things:

Note

API authors may want to wrap these methods and provide their own methods.

Also of a note may be the BackendNotImplementedError, which is raised when none of the selected backends have an implementation for a multimethod.

Setting the backend temporarily

To set the backend temporarily, use the set_backend context manager.

import uarray as ua

with ua.set_backend(mybackend):
    # Use multimethods (or code dependent on them) here.

Setting the backend permanently

To set the backend permanently, use the set_global_backend method. It is a recommendation that the global backend should not depend on any other backend, as it is not guaranteed that another backend will be available.

You can also register backends other than the global backend for permanent use, but the global backend will be tried first outside of a set_backend context. This can be done via register_backend.

import uarray as ua

ua.set_global_backend(mybackend)

# Use relevant multimethods here.

Documentation for backend providers

Backend providers can provide a back-end for a defined API within the uarray ecosystem. To find out how to define your own API with uarray, see Documentation for API authors. To find out how your backend will be provided, use End-user quickstart.

Backend providers need to be aware of three protocols: __ua_domain__, __ua_function__ and __ua_convert__. The first two are mandatory and the last is optional.

__ua_domain__

__ua_domain__ is a string containing the domain of the backend. This is, by convention, the name of the module (or one of its dependencies or parents) that contains the multimethods. For example, scipy and numpy.fft could both be in the numpy domain or one of its subdomains.

Additionally, __ua_domain__ can be a sequence of domains, such as a tuple or list of strings. This allows a single backend to implement functions from more than one domain.

__ua_function__

This is the most important protocol, one that defines the implementation of a multimethod. It has the signature (method, args, kwargs). Note that it is called in this form, so if your backend is an object instead of a module, you should add self. method is the multimethod being called, and it is guaranteed that it is in the same domain as the backend. args and kwargs are the arguments to the function, possibly after conversion (explained below)

Returning NotImplemented signals that the backend does not support this operation.

__ua_convert__

All dispatchable arguments are passed through __ua_convert__ before being passed into __ua_function__. This protocol has the signature (dispatchables, coerce), where dispatchables is iterable of Dispatchable and coerce is whether or not to coerce forcefully. dispatch_type is the mark of the object to be converted, and coerce specifies whether or not to “force” the conversion. By convention, operations larger than O(log n) (where n is the size of the object in memory) should only be done if coerce is True. In addition, there are arguments wrapped as non-coercible via the coercible attribute, if these must be coerced, then one should return NotImplemented.

A convenience wrapper for converting a single object, wrap_single_convertor is provided.

Returning NotImplemented signals that the backend does not support the conversion of the given object.

skip_backend

If a backend consumes multimethods from a domain and provides multimethods for that same domain, it may wish to have the ability to use multimethods while excluding itself from the list of tried backends in order to avoid infinite recursion. This allows the backend to implement its functions in terms of functions provided by other backends. This is the purpose of the skip_backend decorator.

The process that takes place when the backend is tried

First of all, the backend’s __ua_convert__ method is tried. If this returns NotImplemented, then the backend is skipped, otherwise, its __ua_function__ protocol is tried. If a value other than NotImplemented is returned, it is assumed to be the final return value. Any exceptions raised are propagated up the call stack, except a BackendNotImplementedError, which signals a skip of the backend. If all backends are exhausted, or if a backend with its only flag set to True is encountered, a BackendNotImplementedError is raised.

Examples

Examples for library authors can be found in the source of unumpy.numpy_backend and other *_backend.py files in this directory.

Documentation for API authors

Multimethods are the most important part of uarray. They are created via the generate_multimethod function. Multimethods define the API of a project, and backends have to be written against this API. You should see Documentation for backend providers for how to define a backend against the multimethods you write, or End-user quickstart for how to switch backends for a given API.

A multimethod has the following parts:

  • Domain

  • Argument extractor

  • Argument replacer

  • Default implementation

We will go through each of these in detail now.

Domain

See the glossary for domain.

Argument extractor

An argument extractor extracts arguments marked as a given type from the list of given arguments. Note that the objects extracted don’t necessarily have to be in the list of arguments, they can be arbitrarily nested within the arguments. For example, extracting each argument from a list is a possibility. Note that the order is important, as it will come into play later. This function should return an iterable of Dispatchable.

This function has the same signature as the multimethod itself, and the documentation, name and so on are copied from the argument extractor via functools.wraps.

Argument replacer

The argument replacer takes in the arguments and dispatchable arguments, and its job is to replace the arguments previously extracted by the argument extractor by other arguments provided in the list. Therefore, the signature of this function is (args, kwargs, dispatchable_args), and it returns an args/kwargs pair. We realise this is a hard problem in general, so we have provided a few simplifications, such as that the default-valued keyword arguments will be removed from the list.

We recommend following the pattern in here for optimal operation: passing the args/kwargs into a function with a similar signature and then return the modified args/kwargs.

Default implementation

This is a default implementation for the multimethod, ideally with the same signature as the original multimethod. It can also be used to provide one multimethod in terms of others, even if the default implementation for the. downstream multimethods is not defined.

Examples

Examples of writing multimethods are found in this file. It also teaches some advanced techniques, such as overriding instance methods, including __call__. The same philosophy may be used to override properties, static methods, and class methods.

Glossary

Multimethod

A method, possibly with a default/reference implementation, that can have other implementations provided by different backends.

If a multimethod does not have an implementation, a BackendNotImplementedError is raised.

Backend

A backend is an entity that can provide implementations for different functions. It can also (optionally) receive some options from the user about how to process the implementations. A backend can be set permanently or temporarily.

Domain

A domain defines the hierarchical grouping of multimethods. The domain string is, by convention, the name of the module that provides the multimethods.

Sub-domains are denoted with a separating .. For example, a multimethod in "numpy.fft" is also considered to be in the domain "numpy". When calling a multimethod, the backends for the most specific sub-domain are always tried first, followed by the next domain up the hierarchy.

Dispatching

Dispatching is the process of forwarding a function call to an implementation in a backend.

Conversion

A backend might have different object types compared to the reference implementation, or it might require some other conversions of objects. Conversion is the process of converting any given object into a library’s native form.

Coercion

Coercions are conversions that may take a long time, usually those involving copying or moving of data. As a rule of thumb, conversions longer than O(log n) (where n is the size of the object in memory) should be made into coercions.

Marking

Marking is the process of telling the backend what convertor to use for a given argument.

uarray

uarray is built around a back-end protocol, and overridable multimethods. It is necessary to define multimethods for back-ends to be able to override them. See the documentation of generate_multimethod on how to write multimethods.

Let’s start with the simplest:

__ua_domain__ defines the back-end domain. The domain consists of period- separated string consisting of the modules you extend plus the submodule. For example, if a submodule module2.submodule extends module1 (i.e., it exposes dispatchables marked as types available in module1), then the domain string should be "module1.module2.submodule".

For the purpose of this demonstration, we’ll be creating an object and setting its attributes directly. However, note that you can use a module or your own type as a backend as well.

>>> class Backend: pass
>>> be = Backend()
>>> be.__ua_domain__ = "ua_examples"

It might be useful at this point to sidetrack to the documentation of generate_multimethod to find out how to generate a multimethod overridable by uarray. Needless to say, writing a backend and creating multimethods are mostly orthogonal activities, and knowing one doesn’t necessarily require knowledge of the other, although it is certainly helpful. We expect core API designers/specifiers to write the multimethods, and implementors to override them. But, as is often the case, similar people write both.

Without further ado, here’s an example multimethod:

>>> import uarray as ua
>>> from uarray import Dispatchable
>>> def override_me(a, b):
...   return Dispatchable(a, int),
>>> def override_replacer(args, kwargs, dispatchables):
...     return (dispatchables[0], args[1]), {}
>>> overridden_me = ua.generate_multimethod(
...     override_me, override_replacer, "ua_examples"
... )

Next comes the part about overriding the multimethod. This requires the __ua_function__ protocol, and the __ua_convert__ protocol. The __ua_function__ protocol has the signature (method, args, kwargs) where method is the passed multimethod, args/kwargs specify the arguments and dispatchables is the list of converted dispatchables passed in.

>>> def __ua_function__(method, args, kwargs):
...     return method.__name__, args, kwargs
>>> be.__ua_function__ = __ua_function__

The other protocol of interest is the __ua_convert__ protocol. It has the signature (dispatchables, coerce). When coerce is False, conversion between the formats should ideally be an O(1) operation, but it means that no memory copying should be involved, only views of the existing data.

>>> def __ua_convert__(dispatchables, coerce):
...     for d in dispatchables:
...         if d.type is int:
...             if coerce and d.coercible:
...                 yield str(d.value)
...             else:
...                 yield d.value
>>> be.__ua_convert__ = __ua_convert__

Now that we have defined the backend, the next thing to do is to call the multimethod.

>>> with ua.set_backend(be):
...      overridden_me(1, "2")
('override_me', (1, '2'), {})

Note that the marked type has no effect on the actual type of the passed object. We can also coerce the type of the input.

>>> with ua.set_backend(be, coerce=True):
...     overridden_me(1, "2")
...     overridden_me(1.0, "2")
('override_me', ('1', '2'), {})
('override_me', ('1.0', '2'), {})

Another feature is that if you remove __ua_convert__, the arguments are not converted at all and it’s up to the backend to handle that.

>>> del be.__ua_convert__
>>> with ua.set_backend(be):
...     overridden_me(1, "2")
('override_me', (1, '2'), {})

You also have the option to return NotImplemented, in which case processing moves on to the next back-end, which in this case, doesn’t exist. The same applies to __ua_convert__.

>>> be.__ua_function__ = lambda *a, **kw: NotImplemented
>>> with ua.set_backend(be):
...     overridden_me(1, "2")
Traceback (most recent call last):
    ...
uarray.BackendNotImplementedError: ...

The last possibility is if we don’t have __ua_convert__, in which case the job is left up to __ua_function__, but putting things back into arrays after conversion will not be possible.

Functions

all_of_type(arg_type)

Marks all unmarked arguments as a given type.

create_multimethod(*args, **kwargs)

Creates a decorator for generating multimethods.

generate_multimethod(argument_extractor, …)

Generates a multimethod.

mark_as(dispatch_type)

Creates a utility function to mark something as a specific type.

set_backend(backend[, coerce, only])

A context manager that sets the preferred backend.

set_global_backend(backend[, coerce, only, …])

This utility method replaces the default backend for permanent use.

register_backend(backend)

This utility method sets registers backend for permanent use.

clear_backends(domain[, registered, globals])

This utility method clears registered backends.

skip_backend(backend)

A context manager that allows one to skip a given backend from processing entirely.

wrap_single_convertor(convert_single)

Wraps a __ua_convert__ defined for a single element to all elements.

get_state()

Returns an opaque object containing the current state of all the backends.

set_state(state)

A context manager that sets the state of the backends to one returned by get_state.

reset_state()

Returns a context manager that resets all state once exited.

determine_backend(value, dispatch_type, *, …)

Set the backend to the first active backend that supports value

determine_backend_multi(dispatchables, *, domain)

Set a backend supporting all dispatchables

Classes

Dispatchable(value, dispatch_type[, coercible])

A utility class which marks an argument with a specific dispatch type.

Exceptions

BackendNotImplementedError

An exception that is thrown when no compatible backend is found for a method.

Design Philosophies

The following section discusses the design philosophies of uarray, and the reasoning behind some of these philosophies.

Modularity

uarray (and its sister modules unumpy and others to come) were designed from the ground-up to be modular. This is part of why uarray itself holds the core backend and dispatch machinery, and unumpy holds the actual multimethods. Also, unumpy can be developed completely separately to uarray, although the ideal place to have it would be NumPy itself.

However, the benefit of having it separate is that it could span multiple NumPy versions, even before NEP-18 (or even NEP-13) was available. Another benefit is that it can have a faster release cycle to help it achieve this.

Separate Imports

Code wishing to use the backend machinery for NumPy (as an example) will use the statement import unumpy as np instead of the usual import numpy as np. This is deliberate: it makes dispatching opt-in instead of being forced to use it, and the overhead associated with it. However, a package is free to define its main methods as the dispatchable versions, thereby allowing dispatch on the default implementation.

Extensibility and Choice

If some effort is put into the dispatch machinery, it’s possible to dispatch over arbitrary objects — including arrays, dtypes, and so on. A method defines the type of each dispatchable argument, and backends are only passed types they know how to dispatch over when deciding whether or not to use that backend. For example, if a backend doesn’t know how to dispatch over dtypes, it won’t be asked to decide based on that front.

Methods can have a default implementation in terms of other methods, but they’re still overridable.

This means that only one framework is needed to, for example, dispatch over ufunc s, arrays, dtypes and all other primitive objects in NumPy, while keeping the core uarray code independent of NumPy and even unumpy.

Backends can span modules, so SciPy could jump in and define its own methods on NumPy objects and make them overridable within the NumPy backend.

User Choice

The users of unumpy or uarray can choose which backend they want to prefer with a simple context manager. They also have the ability to force a backend, and to skip a backend. This is useful for array-like objects that provide other array-like objects by composing them. For example, Dask could perform all its blockwise function calls with the following psuedocode (obviously, this is simplified):

in_arrays = extract_inner_arrays(input_arrays)
out_arrays = []
for input_arrays_single in in_arrays:
    args, kwargs = blockwise_function.replace_args_kwargs(
        args, kwargs, input_arrays_single)
    with ua.skip_backend(DaskBackend):
        out_arrays_single = blockwise_function(*args, **kwargs)
    out_arrays.append(out_arrays_single)

return combine_arrays(out_arrays)

A user would simply do the following:

with ua.use_backend(DaskBackend):
    # Write all your code here
    # It will prefer the Dask backend

There is no default backend, to unumpy, NumPy is just another backend. One can register backends, which will all be tried in indeterminate order when no backend is selected.

Addressing past flaws

The progress on NumPy’s side for defining an override mechanism has been slow, with NEP-13 being first introduced in 2013, and with the wealth of dispatchable objects (including arrays, ufuns, and dtypes), and with the advent of libraries like Dask, CuPy, Xarray, PyData/Sparse, and XND, it has become clear that the need for alternative array-like implementations is growing. There are even other libraries like PyTorch, and TensorFlow that’d be possible to express in NumPy API-like terms. Another example includes the Keras API, for which an overridable ukeras could be created, similar to unumpy.

uarray is intended to have fast development to fill the need posed by these communities, while keeping itself as general as possible, and quickly reach maturity, after which backward compatibility will be guaranteed.

Performance considerations will come only after such a state has been reached.

GSoC 2020 project ideas

Introduction

This is the Google Summer of Code 2020 (GSoC’20) ideas page for uarray, unumpy and udiff. The uarray library is is a backend mechanism geared towards array computing, but intended for general use. unumpy is an incomplete stub of the NumPy API that can be dispatched by uarray. udiff is a general-purpose automatic differentiation library built on top of unumpy and uarray.

This page lists a number of ideas for Google Summer of Code projects for uarray, plus gives some pointers for potential GSoC students on how to get started with contributing and putting together their application.

Guidelines & requirements

uarray plans to participate in GSoC’20 under the umbrella of Python Software Foundation.

We expect from students that they’re at least comfortable with Python (intermediate level). Some projects may also require C++ or C skills. Knowing how to use Git is also important; this can be learned before the official start of GSoC if needed though.

If you have an idea of what you would like to work on (see below for ideas) and are considering participating:

  1. Read the PSF page carefully, it contains important advice on the process.

  2. Read advice on writing a proposal (written with the Mailman project in mind, but generally applicable)

  3. Make a enhancement/bugfix/documentation fix – it does not have to be big, and it does not need to be related to your proposal. Doing so before applying for the GSoC is a hard requirement for uarray. It helps everyone you get some idea how things would work during GSoC.

  4. Start writing your proposal early, post a draft to the issue tracker and iterate based on the feedback you receive. This will both improve the quality of your proposal and help you find a suitable mentor.

Contact

If you have a question after checking all guideline pages above, you can open an issue in the issue tracker, but feel free to chat with us on Gitter if you need clarification regarding any of the projects. Keep in mind that you might not get a response right away, but we will endeavour to respond as early as possible.

uarray project ideas

uarray: Add querying for state

Adding querying for the uarray._BackendState object will allow users of uarray to see what’s inside the opaque object. Some parts can be re-used from the pickling machinery.

It can also help downstream users to access the parameters of the currently set backend, which is a planned feature of uarray. Here is a list of goals for this project:

  • Allow downstream projects to query the list of backends.

  • Allow downstream projects to query the list of parameters for a backend.

This would enable, for example, the following use-cases:

  • Allow a downstream library to detect a backend and run specialised code for it.

  • Allow a downstream library to fail-fast on a known-unsupported backend.

This project has a straightforward design and needs some implementation work, and will require interacting with the mentors to implement and polish. The accepted student will get an outline of the desired API, along with some failing tests and doctests. The student will make a pull request to implement the desired functionality so that the tests pass.

  • Required knowledge: Python C-API and C++

  • Difficulty level: medium

  • Potential mentors: Peter Bell and Hameer Abbasi

uarray: Allow subdomains

This idea would allow a backend to encompass functions from more than one domain.

The primary goal of this project would be:

  • Develop a system that allows, via some kind of matching mechanism, to select which domains it supports, while maintaining backward compatibility.

This would allow a backend targeting NumPy to also target, for example, the numpy.random submodule. Since the domain for functions in numpy.random will be just that: numpy.random, it won’t match backends defined with the numpy domain, since it’s an exact string match.

The second objective here would be to allow backends to target submodules of projects rather than the whole project. For example, targeting just numpy.random or numpy.fft without targeting all of NumPy.

For more detail see this issue.

This project has a somewhat complicated design and needs some involved implementation work, and will require interacting with the mentors to flesh out and work through.

  • Required knowledge: Python C-API and C++

  • Difficulty level: hard

  • Potential mentors: Peter Bell and Hameer Abbasi

unumpy: Expand overall coverage

This project is split into two parts:

  • Adding further coverage of the NumPy API.

  • Adding more backends to unumpy.

We realise this is a large (possibly open-ended) undertaking, and so there will need to be a minimum amount of work done in order to pass (~150 function stubs, if time allows a JAX backend). You may see the existing methods and figure out how they are written using a combination of the documentation for writing multimethods and the already existing multimethods in this file. For writing backends, you can see the documentation for backends in combination with the already existing backends in this directory.

  • Required knowledge: Python (intermediate level)

  • Difficulty level: easy

  • Potential mentors: Prasun Anand and Hameer Abbasi

udiff: Completion and Packaging

This requires completion and packaging of the udiff library. Potential goals include:

  1. Publishing an initial version to PyPI. Here’s a guide on how to do that.

  2. Adding matrix/tensor calculus support.

    • For this, you can see the matrix cookbook. Don’t be intimidated! There will only be five or so equations you have to pull out of the matrix cookbook and implement, most prominently, the equation for matrix multiplication.

    • Here is how derivatives are registered.

    • The second task here will be to add the “separation” between the data dimensions and the differentiation dimensions. For example, the input could be a vector, or an array of scalars, and this might need to be taken into account when doing the differentiation. That will require some work in this file, and possibly this one as well.

  3. Adding tests.

  • This will require calculating a few derivatives by hand and making sure they match up with what udiff computes.

  • We will use the PyTest framework.

  1. Adding documentation on use, which will be fairly minimal. We will learn to set up Sphinx, and add some documentation.

  2. Publishing a final version to PyPI.

This project has a somewhat some minimal design and needs some involved implementation work. It will allow the accepted student to get an idea of what it’s like to actually publish, test and document a small Python package.

  • Required knowledge: Python (intermediate level) and calculus

  • Difficulty level: medium

  • Potential mentors: Prasun Anand and Hameer Abbasi

Indices and tables