# xarray-simlab: xarray extension for computer model simulationsÂ¶

**xarray-simlab** is a Python library that provides both a generic
framework for building computational models in a modular fashion and a
xarray extension for setting and running simulations using the
`xarray.Dataset`

structure. It is designed for interactive
and exploratory modeling.

## Documentation indexÂ¶

**Getting Started**

### About xarray-simlabÂ¶

xarray-simlab provides a framework for easily building custom computational models from a set of modular components (i.e., Python classes), called processes.

The framework handles issues that scientists who are developing models should not care too much about, like the model interface and the overall workflow management. Both are automatically determined from the succint, declarative-like interfaces of the model processes.

Notably via its xarray extension, xarray-simlab has already deep integration with the SciPy / PyData stack. Next versions will hopefully handle other technical issues like command line integration, interactive visualization and/or running many simulations in parallel, e.g., in the context of sensitivity analyses or inversion procedures.

#### MotivationÂ¶

xarray-simlab is being developped with the idea of reducing the gap between the environments used for building and running computational models and the ones used for processing and analyzing simulation results. If the latter environments become more powerful and interactive, progress has still to be done for the former ones.

xarray-simlab also encourages building new models from re-usable sets of components in order to avoid reinventing the wheel. In many cases we want to customize existing models (e.g., adding a new feature or slightly modifying the behavior) instead of building new models from scratch. This modular framework allows to do that with minimal effort. By implementing models using a large number of small components that can be easily plugged in/out, we eliminate the need of hard-coding changes that we want to apply to a model, which often leads to over-complicated code and interface.

The design of this tool is thus mainly focused on both fast model development and easy, interactive model exploration. Ultimately, this would optimize the iterative back-and-forth process between ideas that we have on how to model a particular phenomenon and insights that we get from the exploration of model behavior.

#### Sources of inspirationÂ¶

xarray-simlab leverages the great number of packages that are part of the Python scientific ecosystem. More specifically, the packages below have been great sources of inspiration for this project.

- xarray: xarray-simlab actually provides an xarray extension for setting and running models.
- attrs: a package that allows writing Python classes without boilerplate. xarray-simlab uses and extends attrs for writing processes as succinct Python classes.
- luigi: the concept of Luigi is to use Python classes as re-usable units that help building complex workflows. xarray-simlabâs concept is similar, but here it is specific to computational (numerical) modeling.
- django (not really a scientific package): the way that model
processes are designed in xarray-simlab has been initially inspired
from Djangoâs ORM (i.e., the
`django.db.models`

part). - param: another source of inspiration for the interface of processes (more specifically the variables that it defines).
- climlab: another python package for process-oriented modeling, which uses the same approach although having a slightly different design/API, and which is applied to climate modeling.
- landlab: like climlab it provides a framework for building model components but it is here applied to landscape evolution modeling. It already has a great list of components ready to use.
- dask: represents fine-grained processing tasks as Directed Acyclic Graphs (DAGs). xarray-simlab models are DAGs too, where the nodes are interdepent processes. In this project we actually borrow some code from dask for resolving process dependencies and for model visualization.

### Frequently Asked QuestionsÂ¶

#### Does xarray-simlab provide built-in models?Â¶

No, xarray-simlab provides only the framework for creating, customizing and running computational models. It is intended to be a general-purpose tool. Domain specific models should be implemented in 3rd party packages. For example, xarray-topo provides xarray-simlab models and model components for simulating landscape evolution.

#### Can xarray-simlab be used with existing model implementations?Â¶

Yes, it should be easy to wrap existing model implementations using xarray-simlab. Even monolithic codes may leverage the xarray interface. However, as the framework works best at a fine grained level (i.e., with models built from many âsmallâ components) it might be worth to refactor those monolithic implementations.

#### Does xarray-simlab allow fast model execution?Â¶

Yes, although it depends on how the model is implemented.

xarray-simlab is written in pure-Python and so is the outer (time) loop in simulations. The execution of Python code is slow compared to other languages, but for the outer loop only it wouldnât represent the main bottleneck of the overall model execution, especially when using an implicit time scheme. For inner (e.g., spatial) loops in each model processes, it might be better to have a numpy vectorized implementation, use tools like Cython or Numba or call wrapped code that is written in, e.g., C/C++ or Fortran (see for example f2py for wrapping Fortran code or pybind11 for wrapping C++11 code).

As with any other framework, xarray-simlab introduces an overhead compared to a simple, straightforward (but non-flexible) implementation of a model. The preliminary benchmarks that we have run show only a very small (almost free) overhead, though. This overhead is mainly introduced by the thin object-oriented layer that model components (i.e., Python classes) together form.

#### Does xarray-simlab support running model(s) in parallel?Â¶

There is currently no support for model execution in parallel but it is a top priority for the next releases!

Three levels of parallelism are possible:

- âinter-modelâ parallelism, i.e., execution of multiple model runs in parallel,
- âinter-processâ parallelism, i.e., execution of multiple processes of a model in parallel,
- âintra-processâ parallelism, i.e., parallel execution of some code written in one or more processes.

Note that the notion of process used above is different from multiprocessing: a process here corresponds to a component of a model (see Modeling Framework section).

The first level âinter-modelâ is an embarrassingly parallel problem. Next versions of xarray-simlab will allow to very easily run simulations in parallel (e.g., for sensitivity analyses).

It shouldnât be hard to add support for the second level âinter-processâ given that processes in a model together form a directed acyclic graph. However, those processes usually perform most of their computation on shared data, which may significantly reduce the gain of parallel execution when using multiple OS processes or in distributed environments. Using multiple threads is limited by the CPythonâs GIL, unless it is released by the code executed in model processes.

The third level âintra-processâ is more domain specific. Users are free to develop xarray-simlab compatible models with custom code (in processes) that is executed either sequentially or in parallel.

#### Is it possible to use xarray-simlab without xarray?Â¶

Although it sounds a bit odd given the name of this package, in principle it is possible. The implementation of the modeling framework is indeed completely decoupled from the xarray interface.

However, the xarray extension provided in this package aims to be the primary, full-featured interface for setting and running simulations from within Python.

The modeling framework itself doesnât have any built-in interface apart from a few helper functions for running specific stages of a simulation. Any other interface has to be built from scratch, but in many cases it wouldnât require a lot of effort. In the future, we plan to also provide an experimental interface for real-time, interactive simulation based on tools like ipywidgets, bokeh and/or holoviews.

#### Will xarray-simlab support Python 2.7.x?Â¶

No, unless there are very good reasons to do so. The main packages of the Python scientific ecosystem support Python 3.4 or later, and it seems that Python 2.x will not be maintained anymore past 2020 (see PEP 373). Although some tools easily allow supporting both Python 2 and 3 versions in a single code base, it still makes the code harder to maintain.

### Install xarray-simlabÂ¶

#### Install using condaÂ¶

xarray-simlab can be installed or updated using conda:

```
$ conda install xarray-simlab -c conda-forge
```

This installs xarray-simlab and all common dependencies, including numpy and xarray.

xarray-simlab conda package is maintained on the conda-forge channel.

#### Install using pipÂ¶

You can also install xarray-simlab and its required dependencies using
`pip`

:

```
$ pip install xarray-simlab
```

#### Install from sourceÂ¶

To install xarray-simlab from source, be sure you have the required dependencies (numpy and xarray) installed first. You might consider using conda to install them:

```
$ conda install attrs xarray numpy pip -c conda-forge
```

A good practice (especially for development purpose) is to install the packages in a separate environment, e.g. using conda:

```
$ conda create -n simlab_py36 python=3.6 attrs xarray numpy pip -c conda-forge
$ source activate simlab_py36
```

Then you can clone the xarray-simlab git repository and install it
using `pip`

locally:

```
$ git clone https://github.com/benbovy/xarray-simlab
$ cd xarray-simlab
$ pip install .
```

For development purpose, use the following command:

```
$ pip install -e .
```

#### Import xarray-simlabÂ¶

To make sure that xarray-simlab is correctly installed, try to import it by running this line:

```
$ python -c "import xsimlab"
```

### ExamplesÂ¶

An example of simple advection is given in the user guide sections. Here below are more advanced examples of using xarray-simlab.

#### Landscape Evolution ModelingÂ¶

A showcase of xarray-simlab in the context of landscape evolution modeling (an almost real world example).

```
In [1]:
```

```
import numpy as np
import xarray as xr
import xsimlab as xs
```

##### Import and inspect a modelÂ¶

The model (i.e., the `xsimlab.Model`

object) that we use here is
provided by the
xarray-topo
package (**Note:** check the version of this package below, it may not
correspond to the latest stable release).

```
In [2]:
```

```
import xtopo
print(xtopo.__version__)
```

```
v0.0.10+2.ga7d7728
```

```
In [3]:
```

```
from xtopo.models.fastscape_base import fastscape_base_model
```

This model simulates the long-term evolution of topographic surface elevation (hereafter noted \(h\)) on a 2D regular grid. The local rate of elevation change, \(\partial h/\partial t\), is determined by the balance between uplift (uniform in space and time) \(U\) and erosion \(E\).

Total erosion \(E\) is the combined effect of the erosion of (bedrock) river channels, noted \(E_r\), and erosion- transport on hillslopes, noted \(E_d\)

Erosion of river channels is given by the stream power law:

where \(A\) is the drainage area and \(K\), \(m\) and \(n\) are parameters.

Erosion on hillslopes is given by a linear diffusion law:

We can see these parameters - as well as the initial elevation surface
and the grid parameters - as model inputs in the `repr`

.

```
In [4]:
```

```
fastscape_base_model
```

```
Out[4]:
```

```
<xsimlab.Model (10 processes, 11 inputs)>
grid
x_length [in] total grid length in x
x_size [in] nb. of nodes in x
y_size [in] nb. of nodes in y
y_length [in] total grid length in y
boundaries
block_uplift
u_coef [in] () or ('y', 'x') uplift rate
flow_routing
pit_method [in]
area
spower
n_exp [in] stream-power slope exponent
m_exp [in] stream-power drainage area exponent
k_coef [in] stream-power constant
diffusion
k_coef [in] diffusivity
erosion
uplift
topography
elevation [inout] ('y', 'x') topographic elevation
```

To have a better picture of all processes (and inputs and/or variables) in the model, we can visualize it as a graph. Processes are in blue and inputs are in yellow. The order in the graph corresponds to the order in which the processes will be exectued during a simulation.

Note: the visualization requires graphviz and python-graphviz packages (both can be installed using conda and the conda-forge channel).

```
In [5]:
```

```
fastscape_base_model.visualize(show_inputs=True)
```

```
Out[5]:
```

More information can be shown for each process in the model, e.g., for the grid

```
In [6]:
```

```
fastscape_base_model.grid
```

```
Out[6]:
```

```
<Grid2D 'grid' (xsimlab process)>
Variables:
x_size [in] nb. of nodes in x
y_size [in] nb. of nodes in y
x_length [in] total grid length in x
y_length [in] total grid length in y
x_spacing [out]
y_spacing [out]
x [out] ('x',)
y [out] ('y',)
Simulation stages:
initialize
```

##### Create a model setupÂ¶

We create a simulation setup using the `create_setup`

function.

```
In [7]:
```

```
nx = 101
ny = 101
in_ds = xs.create_setup(
model=fastscape_base_model,
clocks={
'time': np.linspace(0., 1e6, 101),
'out': np.linspace(0., 1e6, 11)
},
master_clock='time',
input_vars={
'grid': {'x_size': nx, 'y_size': ny, 'x_length': 1e5, 'y_length' :1e5},
'topography': {'elevation': (('y', 'x'), np.random.rand(ny, nx))},
'flow_routing': {'pit_method': 'mst_linear'},
'spower': {'k_coef': 7e-5, 'm_exp': 0.4, 'n_exp': 1},
'diffusion': {'k_coef': 1.},
'block_uplift': {'u_coef': 2e-3}
},
output_vars={
'out': {'topography': 'elevation'},
None: {'grid': ('x', 'y')}
}
)
```

Some explanation about the arguments of `create_setup`

and the values
given above:

- we specify the model we want to use, here
`fastscape_base_model`

, - we specify values for clock coordinates (i.e., time coordinates),
- among these coordinates, we specify the master clock, i.e., the coordinate that will be used to set the time steps,
- we set values for model inputs, grouped by process in the model,
- we set the model variables for which we want to take snapshots during
a simulation, grouped first by clock coordinate (
`None`

means that only one snapshot will be taken at the end of the simulation) and then by process.

Here above, we define a âtimeâ coordinate and another coordinate âoutâ with much larger but aligned time steps (the values are in years). âtimeâ will be used for the simulation time steps and âoutâ will be used to take just a few, evenly-spaced snapshots of variable âelevationâ in process âtopographyâ. We also want to save the \(x\) and \(y\) coordinates of the grid (values in meters), which are time-invariant.

The initial conditions consist here of a nearly flat topographic surface with small random perturbations.

`create_setup`

returns a `xarray.Dataset`

object that contains
everything we need to run the simulation.

```
In [8]:
```

```
in_ds
```

```
Out[8]:
```

```
<xarray.Dataset>
Dimensions: (out: 11, time: 101, x: 101, y: 101)
Coordinates:
* time (time) float64 0.0 1e+04 2e+04 3e+04 4e+04 ...
* out (out) float64 0.0 1e+05 2e+05 3e+05 4e+05 ...
Dimensions without coordinates: x, y
Data variables:
grid__x_size int64 101
grid__y_size int64 101
grid__x_length float64 1e+05
grid__y_length float64 1e+05
topography__elevation (y, x) float64 0.7633 0.6732 0.9937 0.9121 ...
flow_routing__pit_method <U10 'mst_linear'
spower__k_coef float64 7e-05
spower__m_exp float64 0.4
spower__n_exp int64 1
diffusion__k_coef float64 1.0
block_uplift__u_coef float64 0.002
Attributes:
__xsimlab_output_vars__: grid__x,grid__y
```

If present, the metadata (e.g., description, units, math_symbolâŠ) associated to each input variable in the model are added as attributes in the dataset, e.g.,

```
In [9]:
```

```
in_ds.spower__k_coef
```

```
Out[9]:
```

```
<xarray.DataArray 'spower__k_coef' ()>
array(7e-05)
Attributes:
description: stream-power constant
```

##### Run the modelÂ¶

We run the model simply by calling `in_ds.xsimlab.run()`

, which
returns a new Dataset with both the inputs and the outputs.

```
In [10]:
```

```
out_ds = in_ds.xsimlab.run(model=fastscape_base_model)
out_ds
```

```
Out[10]:
```

```
<xarray.Dataset>
Dimensions: (out: 11, time: 101, x: 101, y: 101)
Coordinates:
* time (time) float64 0.0 1e+04 2e+04 3e+04 4e+04 ...
* out (out) float64 0.0 1e+05 2e+05 3e+05 4e+05 ...
Dimensions without coordinates: x, y
Data variables:
grid__x_size int64 101
grid__y_size int64 101
grid__x_length float64 1e+05
grid__y_length float64 1e+05
topography__elevation (out, y, x) float64 0.7633 0.6732 0.9937 ...
flow_routing__pit_method <U10 'mst_linear'
spower__k_coef float64 7e-05
spower__m_exp float64 0.4
spower__n_exp int64 1
diffusion__k_coef float64 1.0
block_uplift__u_coef float64 0.002
grid__x (x) float64 0.0 1e+03 2e+03 3e+03 4e+03 5e+03 ...
grid__y (y) float64 0.0 1e+03 2e+03 3e+03 4e+03 5e+03 ...
```

Note in `out_ds`

the `topography__elevation`

variable which has now
an additional `out`

dimension and also the new variables `grid__x`

and `grid__y`

.

##### Analyse, plot and save the resultsÂ¶

The simulation input and output data is already in a format that allows us using all the nice features of xarray to further analyse, process, plot and/or write to disk (e.g., in a netCDF file) the data.

In this case for example, before doing any further processing it is more
convenient to set \(x\) and \(y\) coordinates as coordinates of
the output `Dataset`

instead of data variables, using the
`set_index`

method. We can easily chain this method with
`xsimlab.run`

as it both return Dataset objects:

```
In [11]:
```

```
out_ds = (in_ds
.xsimlab.run(model=fastscape_base_model)
.set_index(x='grid__x', y='grid__y'))
out_ds
```

```
Out[11]:
```

```
<xarray.Dataset>
Dimensions: (out: 11, time: 101, x: 101, y: 101)
Coordinates:
* time (time) float64 0.0 1e+04 2e+04 3e+04 4e+04 ...
* out (out) float64 0.0 1e+05 2e+05 3e+05 4e+05 ...
* x (x) float64 0.0 1e+03 2e+03 3e+03 4e+03 5e+03 ...
* y (y) float64 0.0 1e+03 2e+03 3e+03 4e+03 5e+03 ...
Data variables:
grid__x_size int64 101
grid__y_size int64 101
grid__x_length float64 1e+05
grid__y_length float64 1e+05
topography__elevation (out, y, x) float64 0.7633 0.6732 0.9937 ...
flow_routing__pit_method <U10 'mst_linear'
spower__k_coef float64 7e-05
spower__m_exp float64 0.4
spower__n_exp int64 1
diffusion__k_coef float64 1.0
block_uplift__u_coef float64 0.002
```

It is then easier to plot the simulation outputs, e.g., here below the elevation values at the end of the simulation:

```
In [12]:
```

```
%matplotlib inline
xr.plot.pcolormesh(out_ds.isel(out=-1).topography__elevation,
size=5, aspect=1);
```

xarray datasets can be used with Holoview, a plotting package that is really helpful for quickly and interactively exploring multi-dimensional data. (it can be installed using conda).

```
In [13]:
```

```
import holoviews as hv
hv.notebook_extension('matplotlib')
```

We can for example see below how the relief is created during the simulation (snapshots are taken every 100000 years and elevation values are in meters).

**Note:** There may be issues with rendering Holoview interactive
visualizations if you are on xarray-simlabâs documentation. Fortunately
you should be able to see this page as a notebook properly rendered on
nbviewer.org.

```
In [14]:
```

```
%%opts Image style(interpolation='bilinear', cmap='viridis') plot[colorbar=True]
hv_ds = hv.Dataset(out_ds.topography__elevation)
hv_ds.to(hv.Image, ['x', 'y'])
```

```
Out[14]:
```

Additionally, We can compute derived quantities without much effort.
Here below we calculate the surface denudation rates (in m/yr) averaged
over each time steps of the output `out`

dimension.

```
In [15]:
```

```
def denudation_rate(ds, time_dim='out'):
topo = ds.topography__elevation
dt = ds[time_dim].diff(time_dim)
den_rate = topo.diff(time_dim) / dt - ds.block_uplift__u_coef
return den_rate
```

```
In [16]:
```

```
den_rate = denudation_rate(out_ds)
```

We further compute and plot the spatially averaged denudation rate.

```
In [17]:
```

```
den_rate.mean(('x', 'y')).plot();
```

##### Run the model with time-varying parameter valuesÂ¶

Instead of providing constant, scalar values for model inputs, it is
possible to provide arrays which have the same dimension as the one used
for the âmaster clockâ (the `time`

dimension in this case).

As an example, we try below a sinusoidal variation for the \(K\) parameter of the stream power law, with a period of 400000 years.

```
In [18]:
```

```
da_k_time = 5e-5 + 3e-5 * np.cos((2 * np.pi / 4e5) * in_ds.time)
da_k_time.plot();
```

We then re-use the simulation setup created above and only update the
parameters of the stream-power process with the new values for \(K\)
(using `Dataset.xsimlab.update_vars`

).

Note the `time`

dimension of the `spower__k_coef`

variable in the
new returned Dataset.

```
In [19]:
```

```
in_ds_kt = in_ds.xsimlab.update_vars(
model=fastscape_base_model,
input_vars={'spower': {'k_coef': da_k_time, 'm_exp': 0.4, 'n_exp': 1}}
)
in_ds_kt
```

```
Out[19]:
```

```
<xarray.Dataset>
Dimensions: (out: 11, time: 101, x: 101, y: 101)
Coordinates:
* time (time) float64 0.0 1e+04 2e+04 3e+04 4e+04 ...
* out (out) float64 0.0 1e+05 2e+05 3e+05 4e+05 ...
Dimensions without coordinates: x, y
Data variables:
grid__x_size int64 101
grid__y_size int64 101
grid__x_length float64 1e+05
grid__y_length float64 1e+05
topography__elevation (y, x) float64 0.7633 0.6732 0.9937 0.9121 ...
flow_routing__pit_method <U10 'mst_linear'
spower__k_coef (time) float64 8e-05 7.963e-05 7.853e-05 ...
spower__m_exp float64 0.4
spower__n_exp int64 1
diffusion__k_coef float64 1.0
block_uplift__u_coef float64 0.002
Attributes:
__xsimlab_output_vars__: grid__x,grid__y
```

We then run the model, unstack the spatial dimensions, compute the denudation rates and plot the spatial averages, here again by easily chaining xarray and xarray-simlab methods on the input Dataset.

If we compare the results with those from the previous run, we clearly see the impact of the time-varying \(K\) parameter values on the denudation rates.

```
In [20]:
```

```
den_rate_kt = (in_ds_kt
.xsimlab.run(model=fastscape_base_model)
.set_index(x='grid__x', y='grid__y')
.pipe(denudation_rate))
den_rate_kt.mean(('x', 'y')).plot();
```

##### Run and combine different model setupsÂ¶

Here is an brief example of running the model multiple times for different fixed values of \(K\) and then concatenate the results into a single dataset. In next versions of xarray-simlab, this process will be even simpler.

```
In [21]:
```

```
def run_model(k_value):
print('run k=%f' % k_value)
ivars = {'spower': {'k_coef': k_value, 'm_exp': 0.4, 'n_exp': 1}}
out_ds = (in_ds
.xsimlab.update_vars(model=fastscape_base_model,
input_vars=ivars)
.xsimlab.run(model=fastscape_base_model)
.set_index(x='grid__x', y='grid__y'))
return out_ds
out_ds_multi = xr.concat(
[run_model(k) for k in (5e-5, 6e-5, 7e-5)],
dim='spower__k_coef', data_vars='different'
)
```

```
run k=0.000050
run k=0.000060
run k=0.000070
```

Note the additional `spower__k_coef`

dimension, which has its own
coordinate with labels corresponding to the different \(K\) values.

```
In [22]:
```

```
out_ds_multi
```

```
Out[22]:
```

```
<xarray.Dataset>
Dimensions: (out: 11, spower__k_coef: 3, time: 101, x: 101, y: 101)
Coordinates:
* time (time) float64 0.0 1e+04 2e+04 3e+04 4e+04 ...
* out (out) float64 0.0 1e+05 2e+05 3e+05 4e+05 ...
* x (x) float64 0.0 1e+03 2e+03 3e+03 4e+03 5e+03 ...
* y (y) float64 0.0 1e+03 2e+03 3e+03 4e+03 5e+03 ...
* spower__k_coef (spower__k_coef) float64 5e-05 6e-05 7e-05
Data variables:
grid__x_size int64 101
grid__y_size int64 101
grid__x_length float64 1e+05
grid__y_length float64 1e+05
flow_routing__pit_method <U10 'mst_linear'
spower__m_exp float64 0.4
spower__n_exp int64 1
diffusion__k_coef float64 1.0
block_uplift__u_coef float64 0.002
topography__elevation (spower__k_coef, out, y, x) float64 0.7633 ...
```

This new dimension also appears in the Holoview figure

```
In [23]:
```

```
%%opts Image style(interpolation='bilinear', cmap='viridis') plot[colorbar=True]
hv_ds = hv.Dataset(out_ds_multi.topography__elevation)
hv_ds.to(hv.Image, ['x', 'y'])
```

```
Out[23]:
```