uarray
¶
Warning
uarray
is a developer tool, it is not meant to be used directly by end-users.
Warning
This document is meant to elicit discussion from the broader community and to help
drive the direction that uarray
goes towards. Examples provided here may not be
immediately stable.
Note
This page describes the overall philosophy behind uarray
. For usage instructions,
see the uarray
API documentation page. If you are interested in augmentation
for NEP-22, please see the unumpy
page.
uarray is a backend system for Python that allows you to separately define an API, along with backends that contain separate implementations of that API.
unumpy builds on top of uarray. It is an effort to specify the core NumPy API, and provide backends for the API.
What’s new in uarray
?¶
uarray
is, to our knowledge, the first backend-system for Python that’s generic
enough to cater to the use-cases of many libraries, while at the same time, being
library independent.
unumpy
is the first approach to leverage uarray
in order to build a
generic backend system for (what we hope will be) the core NumPy API. It will be
possible to create a backend object and use that to perform operations. In addition,
it will be possible to change the used backend via a context manager.
Benefits for end-users¶
End-users can easily take their code written for one backend and use it on another backend with a simple switch (using a Python context manager). This can have any number of effects, depending on the functionality of the library. For example:
For Matplotlib, changing styles of plots or producing different windows or image formats.
For Tensorly, providing a different computation backend that can be distributed or target the GPU or sparse arrays.
For
unumpy
, it can do a similar thing: provide users with code they already wrote for numpy and easily switch to a different backend.
Benefits for library authors¶
To library authors, the benefits come in two forms: First, it allows them to build their
libraries to be implementation independent. In code that builds itself on top of
unumpy
, it would be very easy to target the GPU, use sparse arrays or do any kind
of distributed computing.
The second is to allow a way to separate the interface from implementation, and easily allow a way to switch an implementation.
Relation to the NumPy duck-array ecosystem¶
uarray
is a backend/dispatch mechanism with a focus on array computing and the
needs of the wider array community, by allowing a clean way to register an
implementation for any Python object (functions, classes, class methods, properties,
dtypes, …), it also provides an important building block for
NEP-22.
It is meant to address the shortcomings of NEP-18 and NEP-13;
while still holding nothing in uarray
itself that’s specific to array computing
or the NumPy API.
Where to from here?¶
Choose the documentation page relevant to you:
End-user quickstart¶
Ideally, the only thing an end-user should have to do is set the backend and its options. Given a backend, you (as the end-user) can decide to do one of two things:
Set the backend permanently (use the
set_global_backend
function).Set the backend temporarily (use the
set_backend
context manager).
Note
API authors may want to wrap these methods and provide their own methods.
Also of a note may be the BackendNotImplementedError
, which is raised
when none of the selected backends have an implementation for a multimethod.
Setting the backend temporarily¶
To set the backend temporarily, use the set_backend
context manager.
import uarray as ua
with ua.set_backend(mybackend):
# Use multimethods (or code dependent on them) here.
Setting the backend permanently¶
To set the backend permanently, use the set_global_backend
method. It is a recommendation that the global backend should not
depend on any other backend, as it is not guaranteed that another
backend will be available.
You can also register backends other than the global backend for permanent
use, but the global backend will be tried first outside of a set_backend
context. This can be done via register_backend
.
import uarray as ua
ua.set_global_backend(mybackend)
# Use relevant multimethods here.
Documentation for backend providers¶
Backend providers can provide a back-end for a defined API within
the uarray
ecosystem. To find out how to define your own
API with uarray
, see Documentation for API authors. To find out how
your backend will be provided, use End-user quickstart.
Backend providers need to be aware of three protocols: __ua_domain__
,
__ua_function__
and __ua_convert__
. The first two are mandatory and
the last is optional.
__ua_domain__
¶
__ua_domain__
is a string containing the domain of the backend. This is,
by convention, the name of the module (or one of its dependencies or parents)
that contains the multimethods. For example, scipy
and numpy.fft
could
both be in the numpy
domain or one of its subdomains.
Additionally, __ua_domain__
can be a sequence of domains, such as a tuple or
list of strings. This allows a single backend to implement functions from more
than one domain.
__ua_function__
¶
This is the most important protocol, one that defines the implementation of a
multimethod. It has the signature (method, args, kwargs)
.
Note that it is called in this form, so if your backend is an object instead of
a module, you should add self
. method
is the multimethod being called,
and it is guaranteed that it is in the same domain as the backend. args
and
kwargs
are the arguments to the function, possibly after conversion
(explained below)
Returning NotImplemented
signals that the backend does not support this
operation.
__ua_convert__
¶
All dispatchable arguments are passed through __ua_convert__
before being
passed into __ua_function__
. This protocol has the signature
(dispatchables, coerce)
, where dispatchables
is iterable of
Dispatchable
and coerce
is whether or not to coerce forcefully.
dispatch_type
is the mark of the object to be converted, and coerce
specifies whether or not to “force” the conversion. By convention, operations
larger than O(log n)
(where n
is the size of the object in memory)
should only be done if coerce
is True
. In addition, there are arguments
wrapped as non-coercible via the coercible
attribute, if these must be
coerced, then one should return NotImplemented
.
A convenience wrapper for converting a single object,
wrap_single_convertor
is provided.
Returning NotImplemented
signals that the backend does not support the
conversion of the given object.
skip_backend
¶
If a backend consumes multimethods from a domain and provides multimethods
for that same domain, it may wish to have the ability to use multimethods while
excluding itself from the list of tried backends in order to avoid infinite
recursion. This allows the backend to implement its functions in terms of
functions provided by other backends. This is the purpose of the
skip_backend
decorator.
The process that takes place when the backend is tried¶
First of all, the backend’s __ua_convert__
method is tried. If this returns
NotImplemented
, then the backend is skipped, otherwise, its
__ua_function__
protocol is tried. If a value other than
NotImplemented
is returned, it is assumed to be the final
return value. Any exceptions raised are propagated up the call stack, except a
BackendNotImplementedError
, which signals a skip of the backend. If all
backends are exhausted, or if a backend with its only
flag set to True
is encountered, a BackendNotImplementedError
is raised.
Examples¶
Examples for library authors can be found in the source of unumpy.numpy_backend
and other *_backend.py
files in this directory.
Documentation for API authors¶
Multimethods are the most important part of uarray
. They
are created via the generate_multimethod
function. Multimethods
define the API of a project, and backends have to be written against
this API. You should see Documentation for backend providers for how to define a
backend against the multimethods you write, or End-user quickstart for
how to switch backends for a given API.
A multimethod has the following parts:
Domain
Argument extractor
Argument replacer
Default implementation
We will go through each of these in detail now.
Domain¶
See the glossary for domain.
Argument extractor¶
An argument extractor extracts arguments marked as a
given type from the list of given arguments. Note that the objects extracted
don’t necessarily have to be in the list of arguments, they can be arbitrarily
nested within the arguments. For example, extracting each argument from a list
is a possibility. Note that the order is important, as it will come into play
later. This function should return an iterable of Dispatchable
.
This function has the same signature as the multimethod itself, and
the documentation, name and so on are copied from the argument extractor
via functools.wraps
.
Argument replacer¶
The argument replacer takes in the arguments and dispatchable arguments, and
its job is to replace the arguments previously extracted by the argument
extractor by other arguments provided in the list. Therefore, the
signature of this function is (args, kwargs, dispatchable_args)
,
and it returns an args
/kwargs
pair. We realise this is a hard problem
in general, so we have provided a few simplifications, such as that the
default-valued keyword arguments will be removed from the list.
We recommend following the pattern in here
for optimal operation: passing the args
/kwargs
into a function with a
similar signature and then return the modified args
/kwargs
.
Default implementation¶
This is a default implementation for the multimethod, ideally with the same signature as the original multimethod. It can also be used to provide one multimethod in terms of others, even if the default implementation for the. downstream multimethods is not defined.
Glossary¶
Multimethod¶
A method, possibly with a default/reference implementation, that can have other implementations provided by different backends.
If a multimethod does not have an implementation, a
BackendNotImplementedError
is raised.
Backend¶
A backend is an entity that can provide implementations for different functions. It can also (optionally) receive some options from the user about how to process the implementations. A backend can be set permanently or temporarily.
Domain¶
A domain defines the hierarchical grouping of multimethods. The domain string is, by convention, the name of the module that provides the multimethods.
Sub-domains are denoted with a separating .
. For example, a multimethod in
"numpy.fft"
is also considered to be in the domain "numpy"
. When calling
a multimethod, the backends for the most specific sub-domain are always tried first,
followed by the next domain up the hierarchy.
Dispatching¶
Dispatching is the process of forwarding a function call to an implementation in a backend.
Conversion¶
A backend might have different object types compared to the reference implementation, or it might require some other conversions of objects. Conversion is the process of converting any given object into a library’s native form.
Coercion¶
Coercions are conversions that may take a long time, usually those
involving copying or moving of data. As a rule of thumb, conversions
longer than O(log n)
(where n
is the size of the object in
memory) should be made into coercions.
Marking¶
Marking is the process of telling the backend what convertor to use for a given argument.
uarray¶
uarray
is built around a back-end protocol, and overridable multimethods.
It is necessary to define multimethods for back-ends to be able to override them.
See the documentation of generate_multimethod
on how to write multimethods.
Let’s start with the simplest:
__ua_domain__
defines the back-end domain. The domain consists of period-
separated string consisting of the modules you extend plus the submodule. For
example, if a submodule module2.submodule
extends module1
(i.e., it exposes dispatchables marked as types available in module1
),
then the domain string should be "module1.module2.submodule"
.
For the purpose of this demonstration, we’ll be creating an object and setting its attributes directly. However, note that you can use a module or your own type as a backend as well.
>>> class Backend: pass
>>> be = Backend()
>>> be.__ua_domain__ = "ua_examples"
It might be useful at this point to sidetrack to the documentation of
generate_multimethod
to find out how to generate a multimethod
overridable by uarray
. Needless to say, writing a backend and
creating multimethods are mostly orthogonal activities, and knowing
one doesn’t necessarily require knowledge of the other, although it
is certainly helpful. We expect core API designers/specifiers to write the
multimethods, and implementors to override them. But, as is often the case,
similar people write both.
Without further ado, here’s an example multimethod:
>>> import uarray as ua
>>> from uarray import Dispatchable
>>> def override_me(a, b):
... return Dispatchable(a, int),
>>> def override_replacer(args, kwargs, dispatchables):
... return (dispatchables[0], args[1]), {}
>>> overridden_me = ua.generate_multimethod(
... override_me, override_replacer, "ua_examples"
... )
Next comes the part about overriding the multimethod. This requires
the __ua_function__
protocol, and the __ua_convert__
protocol. The __ua_function__
protocol has the signature
(method, args, kwargs)
where method
is the passed
multimethod, args
/kwargs
specify the arguments and dispatchables
is the list of converted dispatchables passed in.
>>> def __ua_function__(method, args, kwargs):
... return method.__name__, args, kwargs
>>> be.__ua_function__ = __ua_function__
The other protocol of interest is the __ua_convert__
protocol. It has the
signature (dispatchables, coerce)
. When coerce
is False
, conversion
between the formats should ideally be an O(1)
operation, but it means that
no memory copying should be involved, only views of the existing data.
>>> def __ua_convert__(dispatchables, coerce):
... for d in dispatchables:
... if d.type is int:
... if coerce and d.coercible:
... yield str(d.value)
... else:
... yield d.value
>>> be.__ua_convert__ = __ua_convert__
Now that we have defined the backend, the next thing to do is to call the multimethod.
>>> with ua.set_backend(be):
... overridden_me(1, "2")
('override_me', (1, '2'), {})
Note that the marked type has no effect on the actual type of the passed object. We can also coerce the type of the input.
>>> with ua.set_backend(be, coerce=True):
... overridden_me(1, "2")
... overridden_me(1.0, "2")
('override_me', ('1', '2'), {})
('override_me', ('1.0', '2'), {})
Another feature is that if you remove __ua_convert__
, the arguments are not
converted at all and it’s up to the backend to handle that.
>>> del be.__ua_convert__
>>> with ua.set_backend(be):
... overridden_me(1, "2")
('override_me', (1, '2'), {})
You also have the option to return NotImplemented
, in which case processing moves on
to the next back-end, which in this case, doesn’t exist. The same applies to
__ua_convert__
.
>>> be.__ua_function__ = lambda *a, **kw: NotImplemented
>>> with ua.set_backend(be):
... overridden_me(1, "2")
Traceback (most recent call last):
...
uarray.BackendNotImplementedError: ...
The last possibility is if we don’t have __ua_convert__
, in which case the job is left
up to __ua_function__
, but putting things back into arrays after conversion will not be
possible.
Functions
|
Marks all unmarked arguments as a given type. |
|
Creates a decorator for generating multimethods. |
|
Generates a multimethod. |
|
Creates a utility function to mark something as a specific type. |
|
A context manager that sets the preferred backend. |
|
This utility method replaces the default backend for permanent use. |
|
This utility method sets registers backend for permanent use. |
|
This utility method clears registered backends. |
|
A context manager that allows one to skip a given backend from processing entirely. |
|
Wraps a |
Returns an opaque object containing the current state of all the backends. |
|
|
A context manager that sets the state of the backends to one returned by |
Returns a context manager that resets all state once exited. |
|
|
Set the backend to the first active backend that supports |
|
Set a backend supporting all |
Classes
|
A utility class which marks an argument with a specific dispatch type. |
Exceptions
An exception that is thrown when no compatible backend is found for a method. |
Design Philosophies¶
The following section discusses the design philosophies of uarray
, and the
reasoning behind some of these philosophies.
Modularity¶
uarray
(and its sister modules unumpy
and others to come) were designed
from the ground-up to be modular. This is part of why uarray
itself holds
the core backend and dispatch machinery, and unumpy
holds the actual
multimethods. Also, unumpy
can be developed completely separately to
uarray
, although the ideal place to have it would be NumPy itself.
However, the benefit of having it separate is that it could span multiple NumPy versions, even before NEP-18 (or even NEP-13) was available. Another benefit is that it can have a faster release cycle to help it achieve this.
Separate Imports¶
Code wishing to use the backend machinery for NumPy (as an example) will
use the statement import unumpy as np
instead of the usual
import numpy as np
. This is deliberate: it makes dispatching opt-in
instead of being forced to use it, and the overhead associated with it.
However, a package is free to define its main methods as the dispatchable
versions, thereby allowing dispatch on the default implementation.
Extensibility and Choice¶
If some effort is put into the dispatch machinery, it’s possible to dispatch over arbitrary objects — including arrays, dtypes, and so on. A method defines the type of each dispatchable argument, and backends are only passed types they know how to dispatch over when deciding whether or not to use that backend. For example, if a backend doesn’t know how to dispatch over dtypes, it won’t be asked to decide based on that front.
Methods can have a default implementation in terms of other methods, but they’re still overridable.
This means that only one framework is needed to, for example, dispatch
over ufunc
s, arrays, dtypes and all other primitive objects in NumPy,
while keeping the core uarray
code independent of NumPy and even
unumpy
.
Backends can span modules, so SciPy could jump in and define its own methods on NumPy objects and make them overridable within the NumPy backend.
User Choice¶
The users of unumpy
or uarray
can choose which backend they want
to prefer with a simple context manager. They also have the ability to
force a backend, and to skip a backend. This is useful for array-like
objects that provide other array-like objects by composing them. For
example, Dask could perform all its blockwise function calls with the
following psuedocode (obviously, this is simplified):
in_arrays = extract_inner_arrays(input_arrays)
out_arrays = []
for input_arrays_single in in_arrays:
args, kwargs = blockwise_function.replace_args_kwargs(
args, kwargs, input_arrays_single)
with ua.skip_backend(DaskBackend):
out_arrays_single = blockwise_function(*args, **kwargs)
out_arrays.append(out_arrays_single)
return combine_arrays(out_arrays)
A user would simply do the following:
with ua.use_backend(DaskBackend):
# Write all your code here
# It will prefer the Dask backend
There is no default backend, to unumpy
, NumPy is just another backend. One
can register backends, which will all be tried in indeterminate order when no
backend is selected.
Addressing past flaws¶
The progress on NumPy’s side for defining an override mechanism has been slow, with
NEP-13 being first introduced in 2013, and with the wealth of dispatchable objects
(including arrays, ufuns, and dtypes), and with the advent of libraries like Dask,
CuPy, Xarray, PyData/Sparse, and XND, it has become clear that the need for alternative
array-like implementations is growing. There are even other libraries like PyTorch, and
TensorFlow that’d be possible to express in NumPy API-like terms. Another example
includes the Keras API, for which an overridable ukeras
could be created, similar
to unumpy
.
uarray
is intended to have fast development to fill the need posed by these
communities, while keeping itself as general as possible, and quickly reach maturity,
after which backward compatibility will be guaranteed.
Performance considerations will come only after such a state has been reached.
GSoC 2020 project ideas¶
Introduction¶
This is the Google Summer of Code 2020 (GSoC’20) ideas page for uarray
,
unumpy
and udiff
. The uarray
library is is a backend mechanism
geared towards array computing, but intended for general use. unumpy
is an
incomplete stub of the NumPy API that can be dispatched by uarray
.
udiff
is a general-purpose automatic differentiation library built
on top of unumpy
and uarray
.
This page lists a number of ideas for Google Summer of Code projects for
uarray
, plus gives some pointers for potential GSoC students on how to get
started with contributing and putting together their application.
Guidelines & requirements¶
uarray
plans to participate in GSoC’20 under the umbrella of Python Software Foundation.
We expect from students that they’re at least comfortable with Python (intermediate level). Some projects may also require C++ or C skills. Knowing how to use Git is also important; this can be learned before the official start of GSoC if needed though.
If you have an idea of what you would like to work on (see below for ideas) and are considering participating:
Read the PSF page carefully, it contains important advice on the process.
Read advice on writing a proposal (written with the Mailman project in mind, but generally applicable)
Make a enhancement/bugfix/documentation fix – it does not have to be big, and it does not need to be related to your proposal. Doing so before applying for the GSoC is a hard requirement for
uarray
. It helps everyone you get some idea how things would work during GSoC.Start writing your proposal early, post a draft to the issue tracker and iterate based on the feedback you receive. This will both improve the quality of your proposal and help you find a suitable mentor.
Contact¶
If you have a question after checking all guideline pages above, you can open an issue in the issue tracker, but feel free to chat with us on Gitter if you need clarification regarding any of the projects. Keep in mind that you might not get a response right away, but we will endeavour to respond as early as possible.
uarray
project ideas¶
uarray
: Add querying for state¶
Adding querying for the uarray._BackendState
object will allow users of uarray
to see what’s inside the opaque object.
Some parts can be re-used from the pickling machinery.
It can also help downstream users to access the parameters of the currently
set backend, which is a planned feature of uarray
. Here is a list of goals
for this project:
Allow downstream projects to query the list of backends.
Allow downstream projects to query the list of parameters for a backend.
This would enable, for example, the following use-cases:
Allow a downstream library to detect a backend and run specialised code for it.
Allow a downstream library to fail-fast on a known-unsupported backend.
This project has a straightforward design and needs some implementation work, and will require interacting with the mentors to implement and polish. The accepted student will get an outline of the desired API, along with some failing tests and doctests. The student will make a pull request to implement the desired functionality so that the tests pass.
Required knowledge: Python C-API and C++
Difficulty level: medium
Potential mentors: Peter Bell and Hameer Abbasi
uarray
: Allow subdomains¶
This idea would allow a backend to encompass functions from more than one domain.
The primary goal of this project would be:
Develop a system that allows, via some kind of matching mechanism, to select which domains it supports, while maintaining backward compatibility.
This would allow a backend targeting NumPy to also target, for example, the
numpy.random
submodule. Since the domain for functions in
numpy.random
will be just that: numpy.random
, it won’t match
backends defined with the numpy
domain, since it’s an exact string
match.
The second objective here would be to allow backends to target submodules
of projects rather than the whole project. For example, targeting just
numpy.random
or numpy.fft
without targeting all of NumPy.
For more detail see this issue.
This project has a somewhat complicated design and needs some involved implementation work, and will require interacting with the mentors to flesh out and work through.
Required knowledge: Python C-API and C++
Difficulty level: hard
Potential mentors: Peter Bell and Hameer Abbasi
unumpy
: Expand overall coverage¶
This project is split into two parts:
Adding further coverage of the NumPy API.
Adding more backends to
unumpy
.
We realise this is a large (possibly open-ended) undertaking, and so there will need to be a minimum amount of work done in order to pass (~150 function stubs, if time allows a JAX backend). You may see the existing methods and figure out how they are written using a combination of the documentation for writing multimethods and the already existing multimethods in this file. For writing backends, you can see the documentation for backends in combination with the already existing backends in this directory.
Required knowledge: Python (intermediate level)
Difficulty level: easy
Potential mentors: Prasun Anand and Hameer Abbasi
udiff
: Completion and Packaging¶
This requires completion and packaging of the udiff library. Potential goals include:
Publishing an initial version to PyPI. Here’s a guide on how to do that.
Adding matrix/tensor calculus support.
For this, you can see the matrix cookbook. Don’t be intimidated! There will only be five or so equations you have to pull out of the matrix cookbook and implement, most prominently, the equation for matrix multiplication.
Here is how derivatives are registered.
The second task here will be to add the “separation” between the data dimensions and the differentiation dimensions. For example, the input could be a vector, or an array of scalars, and this might need to be taken into account when doing the differentiation. That will require some work in this file, and possibly this one as well.
Adding tests.
This will require calculating a few derivatives by hand and making sure they match up with what
udiff
computes.We will use the PyTest framework.
Adding documentation on use, which will be fairly minimal. We will learn to set up Sphinx, and add some documentation.
Publishing a final version to PyPI.
This project has a somewhat some minimal design and needs some involved implementation work. It will allow the accepted student to get an idea of what it’s like to actually publish, test and document a small Python package.
Required knowledge: Python (intermediate level) and calculus
Difficulty level: medium
Potential mentors: Prasun Anand and Hameer Abbasi