aospy: automated climate data analysis and management

aospy (pronounced A - O - S - py) is an open source Python package for automating your computations that use gridded climate and weather data (namely data stored as netCDF files) and the management of the results of those computations.

aospy enables firing off multiple calculations in parallel using the permutation of an arbitrary number of climate models, simulations, variables to be computed, date ranges, sub-annual-sampling, and many other parameters. In other words, it is possible using aospy to submit and execute all calculations for a particular project (e.g. paper, class project, or thesis chapter) with a single command!

The results get saved in a highly organized directory tree as netCDF files, making it easy to subsequently find and use the data (e.g. for plotting) and preventing “orphan” files with vague filenames and insufficient metadata to remember what they are and/or how they were computed.

The eventual goal is for aospy to become the community standard for gridded climate data analysis and, in so doing, accelerate progress in climate science and make the results of climate research more easily reproducible and shareable.

Documentation

What’s New

v0.2 (26 September 2017)

This release includes some new features plus several bugfixes. The bugfixes include some that previously made using aospy on pressure-interpolated data very problematic. We have also improved support for reading in data from the WRF and CAM atmospheric models.

As of this release, aospy has at least 2(!) confirmed regular users that aren’t the original aospy developers, bringing the worldwide total of users up to at least 4. The first user-generated Github Issues have now also been created. We’re a real thing!

Enhancements
  • Use dask.bag coupled with dask.distributed rather than multiprocess to parallelize computations (closes GH169 via PR172). This enables the optional use of an external distributed.Client to leverage computational resources across multiple nodes of a cluster. By Spencer Clark.
  • Improve support for WRF and NCAR CAM model data by adding the internal names they use for grid attributes to aospy’s lists of potential names to search for. By Spencer Hill.
  • Allow a user to specify a custom preprocessing function in all DataLoaders to prepare data for processing with aospy. This could be used, for example, to add a CF-compliant units attribute to the time coordinate if it is not present in a set of files. Addresses GH177 via PR180. By Spencer Clark.
  • Remove dask.async import in model.py; no longer needed, and also prevents warning message from dask regarding location of get_sync function (PR195). By Spencer Hill.
Dependencies
  • multiprocess is no longer required for submitting aospy calculations in parallel (see discussion in GH169 and pull request PR172).
  • aospy now requires an installation of dask with version greater than or equal to 0.14 (see discussion in pull request PR172).
Bug Fixes
  • Remove faulty logic for calculations with data coming from multiple runs. Eventually this feature will be properly implemented (fixes GH117 via PR178). By Spencer Hill.
  • Only run tests that require optional dependencies if those dependencies are actually installed (fixes GH167 via PR176). By Spencer Hill.
  • Remove obsolete operator.py module (fixes GH174 via PR175). By Spencer Clark.
  • Fix workaround for dates with years less than 1678 to support units attributes with a reference date years not equal to 0001 (fixes GH188 via PR189). By Spencer Clark.
  • Fix bug which would prevent users from analyzing a subset within the Timestamp-valid range from a dataset which included data from outside the Timestamp-valid range (fixed in PR189). By Spencer Clark.
  • Toggle the mask_and_scale option to True when reading in netCDF files to enable missing values encoded as floats to be converted to NaN’s (fixes GH190 via PR192). By Spencer Clark.
  • Force regional calculations to mask gridcell weights where the loaded datapoints were invalid instead of just masking points outside the desired region (fixes GH190 via PR192). By Spencer Clark.
  • Retain original input data’s mask during gridpoint-by-gridpoint temporal averages (fixes GH193 via PR196). By Spencer Hill.
  • Always write output to a tar file in serial to prevent empty header file errors (fixes GH75 via PR197). By Spencer Clark.
  • Allow aospy to use grid attributes that are only defined in Run objects. Previously if a grid attribute were defined only in a Run object and not also in the Run’s corresponding Model, an error would be raised (fixes GH187 via PR199). By Spencer Clark.
  • When input data for a calculation has a time bounds array, overwrite its time array with the average of the start and end times for each timestep. Prevents bug wherein time arrays equal to either the start or end bounds get mistakenly grouped into the wrong time interval, i.e. the wrong month or year (fixes :issue 185 via PR200). By Spencer Hill.

v0.1.2 (30 March 2017)

This release improves the process of submitting multiple calculations for automatic execution. The user interface, documentation, internal logic, and packaging all received upgrades and/or bugfixes.

We also now have a mailing list. Join it to follow and/or post your own usage questions, bug reports, suggestions, etc.

Enhancements
  • Include an example library of aospy objects that works out-of-the-box with the provided example main script (PR155). By Spencer Clark and Spencer Hill.
  • Improve Examples page of the documentation by using this new example object library (PR164). By Spencer Hill.
  • Improve readability/usability of the included example script aospy_main.py for submitting aospy calculations by moving all internal logic into new automate.py module (PR155). By Spencer Clark and Spencer Hill.
  • Enable user to specify whether or not to write output to .tar files (in addition to the standard output). Also document an error that occurs when writing output to .tar files for sufficiently old versions of tar (including the version that ships standard on MacOS), and print a warning when errors are caught during the ‘tar’ call (PR160). By Spencer Hill.
Bug fixes
  • Update packaging specifications such that the example main script and tutorial notebook actually ship with aospy as intended (fixes GH149 via PR161). By Spencer Hill.
  • Use the ‘scipy’ engine for the xarray.DataArray.to_netcdf call when writing aospy calculation outputs to disk to prevent a bug when trying to re-write to an existing netCDF file (fixes GH157 via PR160). By Spencer Hill.

v0.1.1 (2 March 2017)

This release includes fixes for a number of bugs mistakenly introduced in the refactoring of the variable loading step of calc.py (PR90), as well as support for xarray version 0.9.1.

Enhancements
  • Support for xarray version 0.9.1 and require it or a later xarray version. By Spencer Clark and Spencer Hill.
  • Better support for variable names relating to “bounds” dimension of input data files. “bnds”, “bounds”, and “nv” now all supported (PR140). By Spencer Hill.
  • When coercing dims of input data to aospy’s internal names, for scalars change only the name; for non-scalars change the name, force them to have a coord, and copy over their attrs (PR140). By Spencer Hill.
Bug fixes
  • Fix bug involving loading data that has dims that lack coords (which is possible as of xarray v0.9.0). By Spencer Hill.
  • Fix an instance where the name for pressure half levels was mistakenly replaced with the name for the pressure full levels (PR126). By Spencer Clark.
  • Prevent workaround for dates outside the pd.Timestamp valid range from being applied to dates within the pd.Timestamp valid range (PR128). By Spencer Clark.
  • Ensure that all DataArrays associated with aospy.Var objects have a time weights coordinate with CF-compliant time units. This allows them to be cast as the type np.timedelta64, and be safely converted to have units of days before taking time-weighted averages (PR128). By Spencer Clark.
  • Fix a bug where the time weights were not subset in time prior to taking a time weighted average; this caused computed seasonal averages to be too small. To prevent this from failing silently again, we now raise a ValueError if the time coordinate of the time weights is not identical to the time coordinate of the array associated with the aospy.Var (PR128). By Spencer Clark.
  • Enable calculations to be completed using data saved as a single time-slice on disk (fixes GH132 through PR135). By Spencer Clark.
  • Fix bug where workaround for dates outside the pd.Timestamp valid range caused a mismatch between the data loaded and the data requested (fixes GH138 through PR139). By Spencer Clark.

v0.1 (24 January 2017)

Overview: Why aospy?

Use cases

If you’ve ever found yourself saying or thinking anything along these lines, then aospy may be a great tool for you:

  • “What is this precip01.dat file in my home directory that I vaguely remember creating a few weeks ago? Better just delete it and re-do the calculation to be sure.”
  • “I really need to analyze variable X. But the models/observational products I’m using don’t provide it. They do provide variables Y and Z, from which I can compute quantity X.”
  • “I need to calculate quantity X from 4 different simulations repeated in 5 different models; computing monthly, seasonal, and annual averages and standard deviations; gridpoint-by-gridpoint and averaged over these 10 regions. That’s impractical – I’ll just do a small subset.”

Each of these common problems is easily solved by using aospy.

Design philosophy

aospy’s ability to automate calculations (while properly handling model- and simulation-level idiosyncrasies) relies on separating your code into three distinct categories.

  1. Code characterizing the data you want to work with: “Where is your data and what does it represent?”
  2. Code describing abstract physical quantities and geographical regions you eventually want to examine: “What things are you generally interested in calculating?”
  3. Code specifying the exact parameters of calculations you want to perform right now: “Ok, time to actually crunch some numbers. Exactly what all do you want to compute from your data, and how do you want to slice and dice it?”

How you’ll actually interact with aospy in order to achieve each of these three steps is described in the Using aospy section of this documentation, and explicit examples using included sample data are in the Examples section.

Open Science & Reproducible Research

This separation of your code into three categories promotes open science and reproducible research in multiple ways:

  • By separating code that describes where your data is from the code used to compute particular physical quantities, the latter can be written in a generic form that closely mimics the physical form of the particular expression. The resulting code is easier to read and debug and therefore to share with others.
  • By enabling automation of calculations across an arbitrary number of parameter combinations, aospy facilitates more rigorous analyses and/or analyses encompassing a larger span of input data than would otherwise be possible.
  • By outputting the results of calculations as netCDF files in a highly organized directory structure with lots of metadata embued within the file path, file name, and the netCDF file itself, aospy facilitates the sharing of results with others (including your future self that has forgotten the myriad details of how you have computed things right now).

Using aospy

This section provides a high-level summary of how to use aospy. See the Overview section of this documentation for more background information, or the Examples section for concrete examples.

Your aospy object library

The first step is writing the code that describes your data and the quantities you eventually want to compute using it. We refer to this code collectively as your “object library”.

Describing your data on disk

aospy needs to know where the data you want to use is located on disk and how it is organized across different projects, models, and model runs (i.e. simulations). This involves a hierarchy of three classes, aospy.Proj, aospy.Model, and aospy.Run.

  1. aospy.Proj: This represents a single project that involves analysis of data from one or more models and simulations.
  2. aospy.Model: This represents a single climate model, other numerical model, observational data source, etc.
  3. aospy.Run: This represents a single simulation, version of observational data, etc.

So each user’s object library will contain one or more aospy.Proj objects, each of which will have one or more child aospy.Model objects, which in turn will each have one or more child aospy.Run objects.

Note

Currently, the Proj-Model-Run hierarchy is rigid, in that each Run has a parent Model, and each Model has a parent Proj. Work is ongoing to relax this to a more generic parent-child framework.

Physical variables

The aospy.Var class is used to represent physical variables, e.g. precipitation rate or potential temperature. This includes both variables which are directly available in netCDF files (e.g. they were directly outputted by your model or gridded data product) as well as those fields that must be computed from other variables (e.g. they weren’t directly outputted but can be computed from other variables that were outputted).

Geographical regions

The aospy.Region class is used to define geographical regions over which quantities can be averaged (in addition to gridpoint-by-gridpoint values). Like aospy.Var objects, they are more generic than the objects of the aospy.Proj - aospy.Model - aospy.Run hierarchy, in that they correspond to the generic physical quantities/regions rather than the data of a particular project, model, or simulation.

Object library structure

The officially supported way to submit calculations is the aospy.submit_mult_calcs() function. In order for this to work, your object library must follow one or the other of these structures:

  1. All aospy.Proj and aospy.Var objects are accessible as attributes of your library. This means that my_obj_lib.my_obj works, where my_obj_lib is your object library, and my_obj is the object in question.
  2. All aospy.Proj objects are stored in a container called projs, where projs is an attribute of your library (i.e. my_obj_lib.projs). And likewise for aospy.Var objects in a variables attribute.

Beyond that, you can structure your object library however you wish. In particular, it can be structured as a Python module (i.e. a single “.py” file) or as a package (i.e. multiple “.py” files linked together; see the official documentation on package structuring).

A single module works great for small projects and for initially trying out aospy (this is how the example object library, aospy.examples.example_obj_lib, is structured). But as your object library grows, it can become easier to manage as a package of multiple files. For an example of a large object library that is structured as a formal package, see here.

Accessing your library

If your current working directory is the one containing your library, you can import your library via import my_obj_lib (replacing my_obj_lib with whatever you’ve named yours) in order to pass it to aospy.submit_mult_calcs().

Once you start using aospy a lot, however, this requirement of being in the same directory becomes cumbersome. As a solution, you can add the directory containing your object library to the PYTHONPATH environment variable. E.g if you’re using the bash shell:

export PYTHONPATH=/path/to/your/object/library:${PYTHONPATH}

Of course, replace /path/to/your/object/library with the actual path to yours. This command places your object library at the front of the PYTHONPATH environment variable, which is essentially the first place where Python looks to find packages and modules to be imported. (For more, see Python’s official documentation on PYTHONPATH).

Note

It’s convenient to copy this command into your shell profile (e.g., for the bash shell on Linux or Mac, ~/.bash_profile) so that you don’t have to call it again in every new terminal session.

To test this is working, run python -c "import my_obj_lib" from a directory other than where the library is located (again replacing my_obj_lib with the name you’ve given to your library). If this runs without error, you should be good to go.

Executing calculations

As noted above, the officially supported way to submit calculations is the aospy.submit_mult_calcs() function.

We provide a template “main” script with aospy that uses this function. We recommend copying it to the location of your choice. In the copy, replace the example object library and associated objects with your own. (If you accidentally change the original, you can always get a fresh copy from Github).

Running the main script

Once the main script parameters are all modified as desired, execute the script from the command line as follows

/path/to/your/aospy_main.py

This should generate a text summary of the specified parameters and a prompt as to whether to proceed or not with the calculations. An affirmative response then triggers the calculations to execute.

Note

You may need to change the permissions on the file to make it executable. E.g. from a Mac or Linux: chmod u+x /path/to/your/aospy_main.py. Alternatively you can call python or IPython from the command line to run it: python /path/to/your/aospy_main.py or ipython /path/to/your/aospy_main.py.

Specifically, the parameters are permuted over all possible combinations. So, for example, if two model names and three variable names were listed and all other parameters had only one element, six calculations would be generated and executed. There is no limit to the number of permutations.

Note

You can also call the main script interactively within an IPython session via %run /path/to/your/main.py or, from the command line, run the script and then start an interactive IPython session via ipython -i /path/to/your/main.py.

Or you can call aospy.submit_mult_calcs() directly within an interactive session.

As the calculations are performed, logging information will be printed to the terminal displaying their progress.

Parallelized calculations

The calculations generated by the main script can be executed in parallel using dask.distributed. aospy will either automatically set up a dask.distributed.LocalCluster to perform the calculations, or one can optionally specify an external distributed.Client to delegate the work. Otherwise, or if the user sets parallelize=False in the calc_exec_options argument of aospy.submit_mult_calcs(), script, the calculations will be executed one-by-one.

Particularly on instititutional clusters with many cores, this parallelization yields an impressive speed-up when multiple calculations are generated.

Note

When calculations are performed in parallel, often the logging information from different calculations running simultameously end up interwoven with one another, leading to output that is confusing to follow. Work is ongoing to improve the logging output when the computations are parallelized.

Finding the output

aospy saves the results of all calculations as netCDF files and embeds metadata describing it within the netCDF files, in their filenames, and in the directory structure within which they are saved.

  • Directory structure: /path/to/aospy-rootdir/projname/modelname/runname/varname
  • File name : varname.intvl_out.dtype_out_time.'from_'intvl_in'_'dtype_in_time.model.run.date_range.nc

See the API Reference on aospy.CalcInterface for explanation of each of these components of the path and file name.

Under the hood

aospy.submit_mult_calcs() creates a aospy.CalcSuite object that permutes over the provided lists of calculation specifications, encoding each permutation into a aospy.CalcInterface object.

Note

Actually, when multiple regions and/or output time/regional reductions are specified, these all get passed to each aospy.CalcInterface object rather than being permuted over. They are then looped over during the subsequent calculations. This is to prevent unnecessary re-loading and re-computing, because, for a given simulation/variable/etc., all regions and reduction methods use the same data.

Each aospy.CalcInterface object, in turn, is used to instantiate a aospy.Calc object. The aospy.Calc object, in turn:

  • loads the required netCDF data given its simulation, variable, and date range
  • (if necessary) further truncates the data in time (i.e. to the given subset of the annual cycle, and/or if the requested date range doesn’t exactly align with the time chunking of the input netCDF files)
  • (if the variable is a function of other variables) executes the function that computes the calculation using this loaded and truncated data
  • applies all specified temporal and regional time reductions
  • writes the results (plus additional metadata) to disk as netCDF files and appends it to its own data_out attribute

Note

Unlike aospy.Proj, aospy.Model, aospy.Run, aospy.Var, and aospy.Region, these objects are not intended to be saved in .py files for continual re-use. Instead, they are generated as needed, perform their desired tasks, and then go away.

See the API reference documentation for further details.

Examples

Note

The footnotes in this section provide scientific background to help you understand the motivation and physical meaning of these example calculations. They can be skipped if you are familiar already or aren’t interested in those details.

In this section, we use the example data files included with aospy to demonstrate the standard aospy workflow of executing and submitting multiple calculations at once.

These files contain timeseries of monthly averages of two variables generated by an idealized [1] aquaplanet [2] climate model:

  1. Precipitation generated through gridbox-scale condensation
  2. Precipitation generated through the model’s convective parameterization [3]

Using this data that was directly outputted by the model, let’s compute two other useful quantities: (1) the total precipitation rate, and (2) the fraction of the total precipitation rate that comes from the convective parameterization. We’ll compute the time-average over the whole duration of the data, both at each gridpoint and aggregated over specific regions.

Preliminaries

First we’ll save the path to the example data in a local variable, since we’ll be using it in several places below.

In [1]: import os  # Python built-in package for working with the operating system

In [2]: import aospy

In [3]: rootdir = os.path.join(aospy.__path__[0], 'test', 'data', 'netcdf')

Now we’ll use the fantastic xarray package to inspect the data:

In [4]: import xarray as xr

In [5]: xr.open_mfdataset(os.path.join(rootdir, '000[4-6]0101.precip_monthly.nc'),
   ...:                   decode_times=False)
   ...: 
Out[5]: 
<xarray.Dataset>
Dimensions:            (lat: 64, latb: 65, lon: 128, lonb: 129, nv: 2, time: 36)
Coordinates:
  * lonb               (lonb) float64 -1.406 1.406 4.219 7.031 9.844 12.66 ...
  * lon                (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 16.88 ...
  * latb               (latb) float64 -90.0 -86.58 -83.76 -80.96 -78.16 ...
  * lat                (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 ...
  * nv                 (nv) float64 1.0 2.0
  * time               (time) float64 1.111e+03 1.139e+03 1.17e+03 1.2e+03 ...
Data variables:
    condensation_rain  (time, lat, lon) float64 5.768e-06 5.784e-06 ...
    convection_rain    (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    time_bounds        (time, nv) float64 1.095e+03 1.126e+03 1.126e+03 ...
    average_DT         (time) float64 31.0 28.0 31.0 30.0 31.0 30.0 31.0 ...
    zsurf              (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...

We see that, in this particular model, the variable names for these two forms of precipitation are “condensation_rain” and “convection_rain”, respectively. The file also includes the coordinate arrays (“lat”, “time”, etc.) that indicate where in space and time the data refers to.

Now that we know where and what the data is, we’ll proceed through the workflow described in the Using aospy section of this documentation.

Describing your data

Runs and DataLoaders

First we create an aospy.Run object that stores metadata about this simulation. This includes specifying where its files are located via an aospy.data_loader.DataLoader object.

DataLoaders specify where your data is located and organized. Several types of DataLoaders exist, each for a different directory and file structure; see the API Reference for details.

For our simple case, where the data comprises a single file, the simplest DataLoader, a DictDataLoader, works well. It maps your data based on the time frequency of its output (e.g. 6 hourly, daily, monthly, annual) to the corresponding netCDF files via a simple dictionary:

In [6]: from aospy.data_loader import DictDataLoader

In [7]: file_map = {'monthly': os.path.join(rootdir, '000[4-6]0101.precip_monthly.nc')}

In [8]: data_loader = DictDataLoader(file_map)

We then pass this to the aospy.Run constructor, along with a name for the run and an optional description.

In [9]: from aospy import Run

In [10]: example_run = Run(
   ....:     name='example_run',
   ....:     description='Control simulation of the idealized moist model',
   ....:     data_loader=data_loader
   ....: )
   ....: 

Note

See the API reference for other optional arguments for this and the other core aospy objects used in this tutorial.

Models

Next, we create the aospy.Model object that describes the model in which the simulation was executed. One important attribute is grid_file_paths, which consists of a sequence (e.g. a tuple or list) of paths to netCDF files from which physical attributes of that model can be found that aren’t already embedded in the output netCDF files.

For example, often the land mask that defines which gridpoints are ocean or land is outputted to a single, standalone netCDF file, rather than being included in the other output files. But we often need the land mask, e.g. to define certain land-only or ocean-only regions. [4] This and other grid-related properties shared across all of a Model’s simulations can be found in one or more of the files in grid_file_paths.

The other important attribute is runs, which is a list of the aospy.Run objects that pertain to simulations performed in this particular model.

In [11]: from aospy import Model

In [12]: example_model = Model(
   ....:     name='example_model',
   ....:     grid_file_paths=(
   ....:         os.path.join(rootdir, '00040101.precip_monthly.nc'),
   ....:         os.path.join(rootdir, 'im.landmask.nc')
   ....:     ),
   ....:     runs=[example_run]  # only one Run in our case, but could be more
   ....: )
   ....: 
Projects

Finally, we associate the Model object with an aospy.Proj object. This is the level at which we specify the directories to which aospy output gets written.

In [13]: from aospy import Proj

In [14]: example_proj = Proj(
   ....:     'example_proj',
   ....:     direc_out='example-output',  # default, netCDF output (always on)
   ....:     tar_direc_out='example-tar-output', # output to .tar files (optional)
   ....:     models=[example_model]  # only one Model in our case, but could be more
   ....: )
   ....: 

This extra aospy.Proj level of organization may seem like overkill for this simple example, but it really comes in handy once you start using aospy for more than one project.

Defining physical quantities and regions

Having now fully specified the particular data of interest, we now define the general physical quantities of interest and any geographic regions over which to aggregate results.

Physical variables

We’ll first define aospy.Var objects for the two variables that we saw are directly available as model output:

In [15]: from aospy import Var

In [16]: precip_largescale = Var(
   ....:      name='precip_largescale',  # name used by aospy
   ....:      alt_names=('condensation_rain',),  # its possible name(s) in your data
   ....:      def_time=True,  # whether or not it is defined in time
   ....:      description='Precipitation generated via grid-scale condensation',
   ....:  )
   ....:  precip_convective = Var(
   ....:      name='precip_convective',
   ....:      alt_names=('convection_rain', 'prec_conv'),
   ....:      def_time=True,
   ....:      description='Precipitation generated by convective parameterization',
   ....:  )
   ....: 

When it comes time to load data corresponding to either of these from one or more particular netCDF files, aospy will search for variables matching either name or any of the names in alt_names, stopping at the first successful one. This makes the common problem of model-specific variable names a breeze: if you end up with data with a new name for your variable, just add it to alt_names.

Warning

This assumes that the name and all alternate names are unique to that variable, i.e. that in none of your data do those names actually signify something else. If that was indeed the case, aospy can potentially grab the wrong data without issuing an error message or warning.

Next, we’ll create functions that compute the total precipitation and convective precipitation fraction and combine them with the above aospy.Var objects to define the new aospy.Var objects:

In [18]: def total_precip(condensation_rain, convection_rain):
   ....:     """Sum of large-scale and convective precipitation."""
   ....:     return condensation_rain + convection_rain
   ....: 

In [19]: def conv_precip_frac(precip_largescale, precip_convective):
   ....:     """Fraction of total precip that is from convection parameterization."""
   ....:     total = total_precip(precip_largescale, precip_convective)
   ....:     return precip_convective / total.where(total)
   ....: 

In [20]: precip_total = Var(
   ....:     name='precip_total',
   ....:     def_time=True,
   ....:     func=total_precip,
   ....:     variables=(precip_largescale, precip_convective),
   ....: )
   ....: 

In [21]: precip_conv_frac = Var(
   ....:    name='precip_conv_frac',
   ....:    def_time=True,
   ....:    func=conv_precip_frac,
   ....:    variables=(precip_largescale, precip_convective),
   ....: )
   ....: 

Notice the func and variables attributes that weren’t in the prior Var constuctors. These signify the function to use and the physical quantities to pass to that function in order to compute the quantity.

Note

Although variables is passed a tuple of Var objects corresponding to the physical quantities passed to func, func should be a function whose arguments are the xarray.DataArray objects corresponding to those variables. aospy uses the Var objects to load the DataArrays and then passes them to the function.

This enables you to write simple, expressive functions comprising only the physical operations to perform (since the “data wrangling” part has been handled already).

Warning

Order matters in the tuple of aospy.Var objects passed to the variables attribute: it must match the order of the call signature of the function passed to func.

E.g. in precip_conv_frac above, if we had mistakenly done variables=(precip_convective, precip_largescale), the calculation would execute without error, but all of the results would be physically wrong.

Geographic regions

Last, we define the geographic regions over which to perform aggregations and add them to example_proj. We’ll look at the whole globe and at the Tropics:

In [22]: from aospy import Region

In [23]: globe = Region(
   ....:     name='globe',
   ....:     description='Entire globe',
   ....:     lat_bounds=(-90, 90),
   ....:     lon_bounds=(0, 360),
   ....:     do_land_mask=False
   ....: )
   ....: 

In [24]: tropics = Region(
   ....:     name='tropics',
   ....:     description='Global tropics, defined as 30S-30N',
   ....:     lat_bounds=(-30, 30),
   ....:     lon_bounds=(0, 360),
   ....:     do_land_mask=False
   ....: )
   ....: 

In [25]: example_proj.regions = [globe, tropics]

We now have all of the needed metadata in place. So let’s start crunching numbers!

Submitting calculations

Using aospy.submit_mult_calcs()

Having put in the legwork above of describing our data and the physical quantities we wish to compute, we can submit our desired calculations for execution using aospy.submit_mult_calcs(). Its sole required argument is a dictionary specifying all of the desired parameter combinations.

In the example below, we import and use the example_obj_lib module that is included with aospy and whose objects are essentially identical to the ones we’ve defined above.

In [26]: from aospy.examples import example_obj_lib as lib

In [27]: calc_suite_specs = dict(
   ....:     library=lib,
   ....:     projects=[lib.example_proj],
   ....:     models=[lib.example_model],
   ....:     runs=[lib.example_run],
   ....:     variables=[lib.precip_largescale, lib.precip_convective,
   ....:                lib.precip_total, lib.precip_conv_frac],
   ....:     regions='all',
   ....:     date_ranges='default',
   ....:     output_time_intervals=['ann'],
   ....:     output_time_regional_reductions=['av', 'reg.av'],
   ....:     output_vertical_reductions=[None],
   ....:     input_time_intervals=['monthly'],
   ....:     input_time_datatypes=['ts'],
   ....:     input_time_offsets=[None],
   ....:     input_vertical_datatypes=[False],
   ....: )
   ....: 

See the API Reference on aospy.submit_mult_calcs() for more on calc_suite_specs, including accepted values for each key.

submit_mult_calcs() also accepts a second dictionary specifying some options regarding how we want aospy to display, execute, and save our calculations. For the sake of this simple demonstration, we’ll suppress the prompt to confirm the calculations, submit them in serial rather than parallel, and suppress writing backup output to .tar files:

In [28]: calc_exec_options = dict(prompt_verify=False, parallelize=False,
   ....:                          write_to_tar=False)
   ....: 

Now let’s submit this for execution:

In [29]: from aospy import submit_mult_calcs

In [30]: calcs = submit_mult_calcs(calc_suite_specs, calc_exec_options)

This permutes over all of the parameter settings in calc_suite_specs, generating and executing the resulting calculation. In this case, it will compute all four variables and perform annual averages, both for each gridpoint and regionally averaged.

Although we do not show it here, this also prints logging information to the terminal at various steps during each calculation, including the filepaths to the netCDF files written to disk of the results.

Results

The result is a list of aospy.Calc objects, one per simulation.

In [31]: calcs
Out[31]: 
[<aospy.Calc instance: precip_largescale, example_proj, example_model, example_run>,
 <aospy.Calc instance: precip_convective, example_proj, example_model, example_run>,
 <aospy.Calc instance: precip_total, example_proj, example_model, example_run>,
 <aospy.Calc instance: precip_conv_frac, example_proj, example_model, example_run>]

Each aospy.Calc object includes the paths to the output

In [32]: calcs[0].path_out
Out[32]: 
{'av': 'example-output/example_proj/example_model/example_run/precip_largescale/precip_largescale.ann.av.from_monthly_ts.example_model.example_run.0004-0006.nc',
 'reg.av': 'example-output/example_proj/example_model/example_run/precip_largescale/precip_largescale.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc'}

and the results of each output type

Note

You may have noticed that subset_... and raw_... coordinates have years 1678 and later, when our data was from model years 4 through 6. This is because technical details upstream limit the range of supported whole years to 1678-2262.

As a workaround, aospy pretends that any timeseries that starts before the beginning of this range actually starts at 1678. An upstream fix is currently under way, at which point all dates will be supported without this workaround.

Gridpoint-by-gridpoint

Let’s plot (using matplotlib) the time average at each gridcell of all four variables. For demonstration purposes, we’ll load the data that was saved to disk using xarray rather than getting it directly from the data_out attribute as above.

In [33]: from matplotlib import pyplot as plt

In [34]: fig = plt.figure()

In [35]: for i, calc in enumerate(calcs):
   ....:     ax = fig.add_subplot(2, 2, i+1)
   ....:     arr = xr.open_dataset(calc.path_out['av']).to_array()
   ....:     if calc.name != precip_conv_frac.name:
   ....:         arr *= 86400  # convert to units mm per day
   ....:     arr.plot(ax=ax)
   ....:     ax.set_title(calc.name)
   ....:     ax.set_xticks(range(0, 361, 60))
   ....:     ax.set_yticks(range(-90, 91, 30))
   ....: 

In [36]: plt.tight_layout()

In [37]: plt.show()
_images/plot_av.png

We see that precipitation maximizes at the equator and has a secondary maximum in the mid-latitudes. [5] Also, the convective precipitation dominates the total in the Tropics, but moving poleward the gridscale condensation plays an increasingly larger fractional role (note different colorscales in each panel). [6]

Regional averages

Now let’s examine the regional averages. We find that the global annual mean total precipitation rate for this run (converting to units of mm per day) is:

In [38]: for calc in calcs:
   ....:     ds = xr.open_dataset(calc.path_out['reg.av'])
   ....:     if calc.name != precip_conv_frac.name:
   ....:         ds *= 86400  # convert to units mm/day
   ....:     print(calc.name, ds, '\n')
   ....: 
precip_largescale <xarray.Dataset>
Dimensions:              ()
Coordinates:
    subset_start_date    datetime64[ns] 1678-01-01
    subset_end_date      datetime64[ns] 1680-12-31
    raw_data_end_date    datetime64[ns] 1681-01-01
    raw_data_start_date  datetime64[ns] 1678-01-01
Data variables:
    globe                float64 0.9149
    tropics              float64 0.4551 

precip_convective <xarray.Dataset>
Dimensions:              ()
Coordinates:
    subset_start_date    datetime64[ns] 1678-01-01
    subset_end_date      datetime64[ns] 1680-12-31
    raw_data_end_date    datetime64[ns] 1681-01-01
    raw_data_start_date  datetime64[ns] 1678-01-01
Data variables:
    globe                float64 2.11
    tropics              float64 3.055 

precip_total <xarray.Dataset>
Dimensions:              ()
Coordinates:
    subset_start_date    datetime64[ns] 1678-01-01
    subset_end_date      datetime64[ns] 1680-12-31
    raw_data_end_date    datetime64[ns] 1681-01-01
    raw_data_start_date  datetime64[ns] 1678-01-01
Data variables:
    globe                float64 3.025
    tropics              float64 3.51 

precip_conv_frac <xarray.Dataset>
Dimensions:              ()
Coordinates:
    subset_end_date      datetime64[ns] 1680-12-31
    subset_start_date    datetime64[ns] 1678-01-01
    raw_data_end_date    datetime64[ns] 1681-01-01
    raw_data_start_date  datetime64[ns] 1678-01-01
Data variables:
    globe                float64 0.598
    tropics              float64 0.808 

As was evident from the plots, we see that most precipitation (80.8%) in the tropics comes from convective rainfall, but averaged over the globe the large-scale condensation is a more equal player (40.2% for large-scale, 59.8% for convective).

Beyond this simple example

Scaling up

In this case, we computed time averages of four variables, both at each gridpoint (which we’ll call 1 calculation) and averaged over two regions, yielding (4 variables)*(1 gridcell operation + (2 regions)*(1 regional operation)) = 12 total calculations executed. Not bad, but 12 calculations is few enough that we probably could have handled them without aospy.

The power of aospy is that, with the infrastructure we’ve put in place, we can now fire off additional calculations at any time. Some examples:

  • Set output_time_regional_reductions=['ts', 'std', 'reg.ts', 'reg.std'] : calculate the timeseries (‘ts’) and standard deviation (‘std’) of annual mean values at each gridpoint and for the regional averages.
  • Set output_time_intervals=range(1, 13) : average across years for each January (1), each February (2), etc. through December (12). [7]

With these settings, the number of calculations is now (4 variables)*(2 gridcell operations + (2 regions)*(2 regional operations))*(12 temporal averages) = 288 calculations submitted with a single command.

Modifying your object library

We can also add new objects to our object library at any time. For example, suppose we performed a new simulation in which we modified the formulation of the convective parameterization. All we would have to do is create a corresponding aospy.Run object, and then we can execute calculations for that simulation. And likewise for models, projects, variables, and regions.

As a real-world example, two of aospy’s developers use aospy for in their own scientific research, with multiple projects each comprising multiple models, simulations, etc. They routinely fire off thousands of calculations at once. And thanks to the highly organized and metadata-rich directory structure and filenames of the aospy output netCDF files, all of the resulting data is easy to find and use.

Example “main” script

Finally, aospy comes included with a “main” script for submitting calculations that is pre-populated with the objects from the example object library. It also comes with in-line instructions on how to use it, whether you want to keep playing with the example library or modify it to use on your own object library.

It is located in “examples” directory of your aospy installation. Find it via typing python -c "import os, aospy; print(os.path.join(aospy.__path__[0], 'examples', 'aospy_main.py'))" from your terminal.

Footnotes

[1]An “idealized climate model” is a model that, for the sake of computational efficiency and conceptual simplicity, omits and/or simplifies various processes relative to how they are computed in full, production-class models. The particular model used here is described in Frierson et al 2006.
[2]An “aquaplanet” is simply a climate model in which the the surface is entirely ocean, i.e. there is no land. Interactions between atmospheric and land processes are complicated, and so an aquaplanet avoids those complications while still generating a climate (when zonally averaged, i.e. averaged around each latitude circle) that roughly resembles that of the real Earth’s.
[3]Most climate models generate precipitation through two separate pathways: (1) direct saturation of a whole gridbox, which results in condensation and precipitation, and (2) a “convective parameterization.” The latter simulates the precipitation that, due to subgrid-scale variability, can be expected to occur at some fraction of the area within a gridcell, even though the cell as a whole isn’t saturated. The total precipitation is simply the sum of these “large-scale” and “convective” components.
[4]In this case, the model being used is an aquaplanet, so the mask will be simply all ocean. But this is not generally the case – comprehensive climate and weather models include Earth’s full continental geometry and land topography (at least as well as can be resolved at their particular horizontal grid resolution).
[5]This equatorial rainband is called the Intertropical Convergence Zone, or ITCZ. In this simulation, the imposed solar radiation is fixed at Earth’s annual mean value, which is symmetric about the equator. The ITCZ typically follows the solar radiation maximum, hence its position in this case directly on the equator.
[6]This is a very common result. The gridcells of many climate models are several hundred kilometers by several hundred kilometers in area. In Earth’s Tropics, most rainfall is generated by cumulus towers that are much smaller than this. But in the mid-latitudes, a phenomenon known as baroclinic instability generates much larger eddies that can span several hundred kilometers.
[7]In this particular simulation, the boundary conditions are constant in time, so there is no seasonal cycle. But we could use these monthly averages to confirm that’s actually the case, i.e. that we didn’t accidentally use time-varying solar radiation when we ran the model.

Installation

Supported platforms

  • Operating system: Windows, MacOS/Mac OS X, and Linux
  • Python: 2.7, 3.4, 3.5, and 3.6

Note

We highly recommend installing and using the free Anaconda (or Miniconda) distribution of Python, which works on Mac, Linux, and Windows, both on normal computers and institutional clusters and doesn’t require root permissions.

See e.g. this blog post for instructions on how to install Miniconda on a large cluster without root permission.

Alternative method #1: pip

aospy is available from the official Python Packaging Index (PyPI) via pip:

pip install aospy

Alternative method #2: clone from Github

You can also directly clone the Github repo

git clone https://www.github.com/spencerahill/aospy.git
cd aospy
python setup.py install

Verifying proper installation

Once installed via any of these methods, you can run aospy’s suite of tests using py.test. From the top-level directory of the aospy installation

conda install pytest  # if you don't have it already; or 'pip install pytest'
py.test aospy

If you don’t know the directory where aospy was installed, you can find it via

python -c "import aospy; print(aospy.__path__[0])"

If the pytest command results in any error messages or test failures, something has gone wrong, and please refer to the Troubleshooting information below.

Troubleshooting

Please search through our Issues page on Github and our mailing list to see if anybody else has had the same problem you’re facing. If none do, then please send a message on the list or open a new Issue.

Please don’t hesitate, especially if you’re new to the package and/or Python! We are eager to help people get started. Even you suspect it’s something simple or obvious – that just means we didn’t make it clear/easy enough!

API Reference

Here we provide the reference documentation for aospy’s public API. If you are new to the package and/or just trying to get a feel for the overall workflow, you are better off starting in the Overview, Using aospy, or Examples sections of this documentation.

Warning

aospy is under active development. While we strive to maintain backwards compatibility, it is likely that some breaking changes to the codebase will occur in the future as aospy is improved.

Core Hierarchy for Input Data

aospy provides three classes for specifying the location and characteristics of data saved on disk as netCDF files that the user wishes to use as input data for aospy calculations: Proj, Model, and Run.

Proj
class aospy.proj.Proj(name, description=None, models=None, default_models=None, regions=None, direc_out='', tar_direc_out='')[source]

An object that describes a single project that will use aospy.

This is the top-level class in the aospy hierarchy of data representations. It is meant to contain all of the Model, Run, and Region objects that are of relevance to a particular research project. (Any of these may be used by multiple Proj objects.)

The Proj class itself provides little functionality, but it is an important means of organizing a user’s work across different projects. In particular, the output of all calculations using aospy.Calc are saved in a directory structure whose root is that of the Proj object specified for that calculation.

Attributes

name (str) The run’s name
description (str) A description of the run
direc_out, tar_direc_out (str) The paths to the root directories of, respectively, the standard and .tar versions of the output of aospy calculations saved to disk.
models (dict) A dictionary with entries of the form {model_obj.name: model_obj}, for each of this Proj’s child model objects
default_models (dict) The default model objects on which to perform calculations via aospy.Calc if not otherwise specified
regions (dict) A dictionary with entries of the form {regin_obj.name: region_obj}, for each of this Proj’s child region objects
__init__(name, description=None, models=None, default_models=None, regions=None, direc_out='', tar_direc_out='')[source]
Parameters:

name : str

The project’s name. This should be unique from that of any other Proj objects being used.

description : str, optional

A description of the model. This is not used internally by aospy; it is solely for the user’s information.

regions : {None, sequence of aospy.Region objects}, optional

The desired regions over which to perform region-average calculations.

models : {None, sequence of aospy.Model objects}, optional

The child Model objects of this project.

default_models : {None, sequence of aospy.Run objects}, optional

The subset of this Model’s runs over which to perform calculations by default.

direc_out, tar_direc_out : str

Path to the root directories of where, respectively, regular output and a .tar-version of the output will be saved to disk.

See also

aospy.Model, aospy.Region, aospy.Run

Model
class aospy.model.Model(name=None, description=None, proj=None, grid_file_paths=None, default_start_date=None, default_end_date=None, runs=None, default_runs=None, load_grid_data=False)[source]

An object that describes a single climate or weather model.

Each Model object is associated with a parent Proj object and also with one or more child Run objects.

If aospy is being used to work with non climate- or weather-model data, the Model object can be used e.g. to represent a gridded observational product, with its child Run objects representing different released versions of that dataset.

Attributes

name (str) The model’s name
description (str) A description of the model
proj ({None, aospy.Proj}) The model’s parent aospy.Proj object
runs (list) A list of this model’s child Run objects
default_runs (list) The default subset of child run objects on which to perform calculations via aospy.Calc with this model if not otherwise specified
grid_file_paths (list) The paths to netCDF files stored on disk from which the model’s coordinate data can be taken.
default_start_date, default_end_date (datetime.datetime) The default start and end dates of any calculations using this Model
__init__(name=None, description=None, proj=None, grid_file_paths=None, default_start_date=None, default_end_date=None, runs=None, default_runs=None, load_grid_data=False)[source]
Parameters:

name : str

The model’s name. This must be unique from that of any other Model objects being used by the parent Proj.

description : str, optional

A description of the model. This is not used internally by aospy; it is solely for the user’s information.

proj : {None, aospy.Proj}, optional

The parent Proj object. When the parent Proj object is instantiated with this Model included in its models attribute, this will be over-written with that Proj object.

grid_file_paths : {None, sequence of strings}, optional

The paths to netCDF files stored on disk from which the model’s coordinate data can be taken.

default_start_date, default_end_date : {None, datetime.datetime}, optional

Default start and end dates of calculations to be performed using this Model.

runs : {None, sequence of aospy.Run objects}, optional

The child run objects of this Model

default_runs : {None, sequence of aospy.Run objects}, optional

The subset of this Model’s runs over which to perform calculations by default.

load_grid_data : bool, optional (default False)

Whether or not to load the grid data specified by ‘grid_file_paths’ upon initilization

See also

aospy.DataLoader, aospy.Proj, aospy.Run

set_grid_data()[source]

Populate the attrs that hold grid data.

Run
class aospy.run.Run(name=None, description=None, proj=None, default_start_date=None, default_end_date=None, data_loader=None)[source]

An object that describes a single model ‘run’ (i.e. simulation).

Each Run object is associated with a parent Model object. This parent attribute is not set by Run itself, however; it is set during the instantation of the parent Model object.

If aospy is being used to work with non climate-model data, the Run object can be used e.g. to represent different versions of a gridded observational data product, with the parent Model representing that data product more generally.

Attributes

name (str) The run’s name
description (str) A description of the run
proj ({None, aospy.Proj}) The run’s parent aospy.Proj object
default_start_date, default_end_date (datetime.datetime) The default start and end dates of any calculations using this Run
data_loader (aospy.DataLoader) The aospy.DataLoader object used to find data on disk corresponding to this Run object
__init__(name=None, description=None, proj=None, default_start_date=None, default_end_date=None, data_loader=None)[source]

Instantiate a Run object.

Parameters:

name : str

The run’s name. This must be unique from that of any other Run objects being used by the parent Model.

description : str, optional

A description of the model. This is not used internally by aospy; it is solely for the user’s information.

proj : {None, aospy.Proj}, optional

The parent Proj object.

data_loader : aospy.DataLoader

The DataLoader object used to find the data on disk to be used as inputs for aospy calculations for this run.

default_start_date, default_end_date : datetime.datetime, optional

Default start and end dates of calculations to be performed using this Model.

See also

aospy.DataLoader, aospy.Model

DataLoaders

Run objects rely on a helper “data loader” to specify how to find their underlying data that is saved on disk. This mapping of variables, time ranges, and potentially other parameters to the location of the corresponding data on disk can differ among modeling centers or even between different models at the same center.

Currently supported data loader types are DictDataLoader, NestedDictDataLoader, and GFDLDataLoader Each of these inherit from the abstract base DataLoader class.

class aospy.data_loader.DataLoader[source]

A fundamental DataLoader object

__init__()

Initialize self. See help(type(self)) for accurate signature.

load_variable(var=None, start_date=None, end_date=None, time_offset=None, **DataAttrs)[source]

Load a DataArray for requested variable and time range.

Automatically renames all grid attributes to match aospy conventions.

Parameters:

var : Var

aospy Var object

start_date : datetime.datetime

start date for interval

end_date : datetime.datetime

end date for interval

time_offset : dict

Option to add a time offset to the time coordinate to correct for incorrect metadata.

**DataAttrs

Attributes needed to identify a unique set of files to load from

Returns:

da : DataArray

DataArray for the specified variable, date range, and interval in

class aospy.data_loader.DictDataLoader(file_map=None, preprocess_func=<function DictDataLoader.<lambda>>)[source]

A DataLoader that uses a dict mapping lists of files to string tags

This is the simplest DataLoader; it is useful for instance if one is dealing with raw model history files, which tend to group all variables of a single output interval into single filesets. The intvl_in parameter is a string description of the time frequency of the data one is referencing (e.g. ‘monthly’, ‘daily’, ‘3-hourly’). In principle, one can give it any string value.

Parameters:

file_map : dict

A dict mapping an input interval to a list of files

preprocess_func : function (optional)

A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case of two sets of files, one with monthly average output, and one with 3-hourly output.

>>> file_map = {'monthly': '000[4-6]0101.atmos_month.nc',
...             '3hr': '000[4-6]0101.atmos_8xday.nc'}
>>> data_loader = DictDataLoader(file_map)

If one wanted to correct a CF-incompliant units attribute on each Dataset read in, which depended on the intvl_in of the fileset one could define a preprocess_func which took into account the intvl_in keyword argument.

>>> def preprocess(ds, **kwargs):
...     if kwargs['intvl_in'] == 'monthly':
...         ds['time'].attrs['units'] = 'days since 0001-01-0000'
...     if kwargs['intvl_in'] == '3hr':
...         ds['time'].attrs['units'] = 'hours since 0001-01-0000'
...     return ds
>>> data_loader = DictDataLoader(file_map, preprocess)
__init__(file_map=None, preprocess_func=<function DictDataLoader.<lambda>>)[source]

Create a new DictDataLoader

class aospy.data_loader.NestedDictDataLoader(file_map=None, preprocess_func=<function NestedDictDataLoader.<lambda>>)[source]

DataLoader that uses a nested dictionary mapping to load files

This is the most flexible existing type of DataLoader; it allows for the specification of different sets of files for different variables. The intvl_in parameter is a string description of the time frequency of the data one is referencing (e.g. ‘monthly’, ‘daily’, ‘3-hourly’). In principle, one can give it any string value. The variable name can be any variable name in your aospy object library (including alternative names).

Parameters:

file_map : dict

A dict mapping intvl_in to dictionaries mapping Var objects to lists of files

preprocess_func : function (optional)

A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case of a set of monthly average files for large scale precipitation, and another monthly average set of files for convective precipitation.

>>> file_map = {'monthly': {'precl': '000[4-6]0101.precl.nc',
...                         'precc': '000[4-6]0101.precc.nc'}}
>>> data_loader = NestedDictDataLoader(file_map)

See aospy.data_loader.DictDataLoader for an example of a possible function to pass as a preprocess_func.

__init__(file_map=None, preprocess_func=<function NestedDictDataLoader.<lambda>>)[source]

Create a new NestedDictDataLoader

class aospy.data_loader.GFDLDataLoader(template=None, data_direc=None, data_dur=None, data_start_date=None, data_end_date=None, preprocess_func=<function GFDLDataLoader.<lambda>>)[source]

DataLoader for NOAA GFDL model output

This is an example of a domain-specific custom DataLoader, designed specifically for finding files output by the Geophysical Fluid Dynamics Laboratory’s model history file post-processing tools.

Parameters:

template : GFDLDataLoader

Optional argument to specify a base GFDLDataLoader to inherit parameters from

data_direc : str

Root directory of data files

data_dur : int

Number of years included per post-processed file

data_start_date : datetime.datetime

Start date of data files

data_end_date : datetime.datetime

End date of data files

preprocess_func : function (optional)

A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case without a template to start from.

>>> base = GFDLDataLoader(data_direc='/archive/control/pp', data_dur=5,
...                       data_start_date=datetime(2000, 1, 1),
...                       data_end_date=datetime(2010, 12, 31))

Case with a starting template.

>>> data_loader = GFDLDataLoader(base, data_direc='/archive/2xCO2/pp')

See aospy.data_loader.DictDataLoader for an example of a possible function to pass as a preprocess_func.

__init__(template=None, data_direc=None, data_dur=None, data_start_date=None, data_end_date=None, preprocess_func=<function GFDLDataLoader.<lambda>>)[source]

Create a new GFDLDataLoader

Variables and Regions

The Var and Region classes are used to represent, respectively, physical quantities the user wishes to be able to compute and geographical regions over which the user wishes to aggregate their calculations.

Whereas the Proj - Model - Run hierarchy is used to describe the data resulting from particular model simulations, Var and Region represent the properties of generic physical entities that do not depend on the underlying data.

Var
class aospy.var.Var(name, alt_names=None, func=None, variables=None, func_input_dtype='DataArray', units='', plot_units='', plot_units_conv=1, domain='atmos', description='', def_time=False, def_vert=False, def_lat=False, def_lon=False, math_str=False, colormap='RdBu_r', valid_range=None)[source]

An object representing a physical quantity to be computed.

Attributes

name (str) The variable’s name
alt_names (tuple of strings) All other names that the variable may be referred to in the input data
names (tuple of strings) The combination of name and alt_names
description (str) A description of the variable
func (function) The function with which to compute the variable
variables (sequence of aospy.Var objects) The variables passed to func to compute it
func_input_dtype ({‘DataArray’, ‘Dataset’, ‘numpy’}) The datatype expected by func of its arguments
units (aospy.units.Units object) The variable’s physical units
domain (str) The physical domain of the variable, e.g. ‘atmos’, ‘ocean’, or ‘land’
def_time, def_vert, def_lat, def_lon (bool) Whether the variable is defined, respectively, in time, vertically, in latitude, and in longitude
math_str (str) The mathematical representation of the variable
colormap (str) The name of the default colormap to be used in plots of this variable
valid_range (length-2 tuple) The range of values outside which to flag as unphysical/erroneous
__init__(name, alt_names=None, func=None, variables=None, func_input_dtype='DataArray', units='', plot_units='', plot_units_conv=1, domain='atmos', description='', def_time=False, def_vert=False, def_lat=False, def_lon=False, math_str=False, colormap='RdBu_r', valid_range=None)[source]

Instantiate a Var object.

Parameters:

name : str

The variable’s name

alt_names : tuple of strings

All other names that the variable might be referred to in any input data. Each of these should be unique to this variable in order to avoid loading the wrong quantity.

description : str

A description of the variable

func : function

The function with which to compute the variable

variables : sequence of aospy.Var objects

The variables passed to func to compute it. Order matters: whenever calculations are performed to generate data corresponding to this Var, the data corresponding to the elements of variables will be passed to self.function in the same order.

func_input_dtype : {None, ‘DataArray’, ‘Dataset’, ‘numpy’}

The datatype expected by func of its arguments

units : aospy.units.Units object

The variable’s physical units

domain : str

The physical domain of the variable, e.g. ‘atmos’, ‘ocean’, or ‘land’. This is only used by aospy by some types of DataLoader, including GFDLDataLoader.

def_time, def_vert, def_lat, def_lon : bool

Whether the variable is defined, respectively, in time, vertically, in latitude, and in longitude

math_str : str

The mathematical representation of the variable. This is typically a raw string of LaTeX math-mode, e.g. r’$T_mathrm{sfc}$’ for surface temperature.

colormap : str

(Currently not used by aospy) The name of the default colormap to be used in plots of this variable.

valid_range : length-2 tuple

The range of values outside which to flag as unphysical/erroneous

mask_unphysical(data)[source]

Mask data array where values are outside physically valid range.

to_plot_units(data, dtype_vert=False)[source]

Convert the given data to plotting units.

Region
class aospy.region.Region(name='', description='', lon_bounds=[], lat_bounds=[], mask_bounds=[], do_land_mask=False)[source]

Geographical region over which to perform averages and other reductions.

Each Proj object includes a list of Region objects, which is used by Calc to determine which regions over which to perform time reductions over region-average quantities.

Region boundaries are specified as either a single “rectangle” in latitude and longitude or the union of multiple such rectangles. In addition, a land or ocean mask can be applied.

See also

aospy.Calc.region_calcs

Attributes

name (str) The region’s name
description (str) A description of the region
mask_bounds (tuple) The coordinates definining the lat-lon rectangle(s) that define(s) the region’s boundaries
do_land_mask Whether values occurring over land, ocean, or neither are excluded from the region, and whether the included points must be strictly land or ocean or if fractional land/ocean values are included.
__init__(name='', description='', lon_bounds=[], lat_bounds=[], mask_bounds=[], do_land_mask=False)[source]

Instantiate a Region object.

If a region spans across the endpoint of the data’s longitude array (i.e. it crosses the Prime Meridian for data with longitudes spanning 0 to 360), it must be defined as the union of two sections extending to the east and to the west of the Prime Meridian.

Parameters:

name : str

The region’s name. This must be unique from that of any other Region objects being used by the overlying Proj.

description : str, optional

A description of the region. This is not used internally by aospy; it is solely for the user’s information.

lon_bounds, lat_bounds : length-2 sequence, optional

The longitude and latitude bounds of the region. If the region boundaries are more complicated than a single lat-lon rectangle, use mask_bounds instead.

mask_bounds : sequence, optional

Each element is a length-2 tuple of the format (lat_bounds, lon_bounds), where each of lat_bounds and lon_bounds are of the form described above.

do_land_mask : { False, True, ‘ocean’, ‘strict_land’, ‘strict_ocean’},

optional

Determines what, if any, land mask is applied in addition to the mask defining the region’s boundaries. Default False.

  • True: apply the data’s full land mask
  • False: apply no mask
  • ‘ocean’: mask out land rather than ocean
  • ‘strict_land’: mask out all points that are not 100% land
  • ‘strict_ocean’: mask out all points that are not 100% ocean

Examples

Define a region spanning the entire globe

>>> globe = Region(name='globe', lat_bounds=(-90, 90),
...                lon_bounds=(0, 360), do_land_mask=False)

Define a region corresponding to the Sahel region of Africa, which we’ll define as land points within 10N-20N latitude and 18W-40E longitude. Because this region crosses the 0 degrees longitude point, it has to be defined using mask_bounds as the union of two lat-lon rectangles.

>>> sahel = Region(name='sahel', do_land_mask=True,
...                mask_bounds=[((10, 20), (0, 40)),
...                             ((10, 20), (342, 360))])
av(data)[source]

Time average of region-average time-series.

mask_var(data)[source]

Mask the data of the given variable outside the region.

std(data)[source]

Standard deviation of region-average time-series.

ts(data)[source]

Create time-series of region-average data.

Calculations

Calc is the engine that combines the user’s specifications of (1) the data on disk via Proj, Model, and Run, (2) the physical quantity to compute and regions to aggregate over via Var and Region, and (3) the desired date range, time reduction method, and other characteristics to actually perform the calculation

Whereas Proj, Model, Run, Var, and Region are all intended to be saved in .py files for reuse, Calc objects are intended to be generated dynamically by a main script and then not retained after they have written their outputs to disk following the user’s specifications.

Moreover, if the main.py script is used to execute calculations, no direct interfacing with Calc or it’s helper class, CalcInterface is required by the user, in which case this section should be skipped entirely.

Also included is the automate module, which enables aospy e.g. in the main script to find objects in the user’s object library that the user specifies via their string names rather than having to import the objects themselves.

CalcInterface and Calc
class aospy.calc.CalcInterface(proj=None, model=None, run=None, ens_mem=None, var=None, date_range=None, region=None, intvl_in=None, intvl_out=None, dtype_in_time=None, dtype_in_vert=None, dtype_out_time=None, dtype_out_vert=None, level=None, time_offset=None)[source]

Interface to the Calc class.

__init__(proj=None, model=None, run=None, ens_mem=None, var=None, date_range=None, region=None, intvl_in=None, intvl_out=None, dtype_in_time=None, dtype_in_vert=None, dtype_out_time=None, dtype_out_vert=None, level=None, time_offset=None)[source]

Instantiate a CalcInterface object.

Parameters:

proj : aospy.Proj object

The project for this calculation.

model : aospy.Model object

The model for this calculation.

run : aospy.Run object

The run for this calculation.

var : aospy.Var object

The variable for this calculation.

ens_mem : Currently not supported.

This will eventually be used to specify particular ensemble members of multi-member ensemble simulations.

region : sequence of aospy.Region objects

The region(s) over which any regional reductions will be performed.

date_range : tuple of datetime.datetime objects

The range of dates over which to perform calculations.

intvl_in : {None, ‘annual’, ‘monthly’, ‘daily’, ‘6hr’, ‘3hr’}, optional

The time resolution of the input data.

dtype_in_time : {None, ‘inst’, ‘ts’, ‘av’, ‘av_ts’}, optional

What the time axis of the input data represents:

  • ‘inst’ : Timeseries of instantaneous values
  • ‘ts’ : Timeseries of averages over the period of each time-index
  • ‘av’ : A single value averaged over a date range

dtype_in_vert : {None, ‘pressure’, ‘sigma’}, optional

The vertical coordinate system used by the input data:

  • None : not defined vertically
  • ‘pressure’ : pressure coordinates
  • ‘sigma’ : hybrid sigma-pressure coordinates

intvl_out : {‘ann’, season-string, month-integer}

The sub-annual time interval over which to compute:

  • ‘ann’ : Annual mean
  • season-string : E.g. ‘JJA’ for June-July-August
  • month-integer : 1 for January, 2 for February, etc.

dtype_out_time : tuple with elements being one or more of:

  • Gridpoint-by-gridpoint output:
    • ‘av’ : Gridpoint-by-gridpoint time-average
    • ‘std’ : Gridpoint-by-gridpoint temporal standard deviation
    • ‘ts’ : Gridpoint-by-gridpoint time-series
  • Averages over each region specified via region:
    • ‘reg.av’, ‘reg.std’, ‘reg.ts’ : analogous to ‘av’, ‘std’, ‘ts’

dtype_out_vert : {None, ‘vert_av’, ‘vert_int’}, optional

How to reduce the data vertically:

  • None : no vertical reduction (i.e. output is defined vertically)
  • ‘vert_av’ : mass-weighted vertical average
  • ‘vert_int’ : mass-weighted vertical integral

time_offset : {None, dict}, optional

How to offset input data in time to correct for metadata errors

class aospy.calc.Calc(calc_interface)[source]

Class for executing, saving, and loading a single computation.

Calc objects are instantiated with a single argument: a CalcInterface object that includes all of the parameters necessary to determine what calculations to perform.

__init__(calc_interface)[source]
ARR_XARRAY_NAME = 'aospy_result'
compute(write_to_tar=True)[source]

Perform all desired calculations on the data and save externally.

load(dtype_out_time, dtype_out_vert=False, region=False, time=False, vert=False, lat=False, lon=False, plot_units=False, mask_unphysical=False)[source]

Load the data from the object if possible or from disk.

region_calcs(arr, func)[source]

Perform a calculation for all regions.

save(data, dtype_out_time, dtype_out_vert=False, save_files=True, write_to_tar=False)[source]

Save aospy data to data_out attr and to an external file.

automate

Functionality for specifying and cycling through multiple calculations.

exception aospy.automate.AospyException[source]

Base exception class for the aospy package.

class aospy.automate.CalcSuite(calc_suite_specs)[source]

Suite of Calc objects generated from provided specifications.

create_calcs()[source]

Generate a Calc object for each requested parameter combination.

aospy.automate.submit_mult_calcs(calc_suite_specs, exec_options=None)[source]

Generate and execute all specified computations.

Once the calculations are prepped and submitted for execution, any calculation that triggers any exception or error is skipped, and the rest of the calculations proceed unaffected. This prevents an error in a single calculation from crashing a large suite of calculations.

Parameters:

calc_suite_specs : dict

The specifications describing the full set of calculations to be generated and potentially executed. Accepted keys and their values:

library : module or package comprising an aospy object library

The aospy object library for these calculations.

projects : list of aospy.Proj objects

The projects to permute over.

models : ‘all’, ‘default’, or list of aospy.Model objects

The models to permute over. If ‘all’, use all models in the models attribute of each Proj. If ‘default’, use all models in the default_models attribute of each Proj.

runs : ‘all’, ‘default’, or list of aospy.Run objects

The runs to permute over. If ‘all’, use all runs in the runs attribute of each Model. If ‘default’, use all runs in the default_runs attribute of each Model.

variables : list of aospy.Var objects

The variables to be calculated.

regions : ‘all’ or list of aospy.Region objects

The region(s) over which any regional reductions will be performed. If ‘all’, use all regions in the regions attribute of each Proj.

date_ranges : ‘default’ or tuple of datetime.datetime objects

The range of dates (inclusive) over which to perform calculations. If ‘default’, use the default_start_date and default_end_date attribute of each Run.

output_time_intervals : {‘ann’, season-string, month-integer}

The sub-annual time interval over which to aggregate.

  • ‘ann’ : Annual mean
  • season-string : E.g. ‘JJA’ for June-July-August
  • month-integer : 1 for January, 2 for February, etc. Each one is
    a separate reduction, e.g. [1, 2] would produce averages (or other specified time reduction) over all Januaries, and separately over all Februaries.
output_time_regional_reductions : list of reduction string identifiers

Unlike most other keys, these are not permuted over when creating the aospy.Calc objects that execute the calculations; each aospy.Calc performs all of the specified reductions. Accepted string identifiers are:

  • Gridpoint-by-gridpoint output:
    • ‘av’ : Gridpoint-by-gridpoint time-average
    • ‘std’ : Gridpoint-by-gridpoint temporal standard deviation
    • ‘ts’ : Gridpoint-by-gridpoint time-series
  • Averages over each region specified via region:
    • ‘reg.av’, ‘reg.std’, ‘reg.ts’ : analogous to ‘av’, ‘std’, ‘ts’
output_vertical_reductions : {None, ‘vert_av’, ‘vert_int’}, optional

How to reduce the data vertically:

  • None : no vertical reduction
  • ‘vert_av’ : mass-weighted vertical average
  • ‘vert_int’ : mass-weighted vertical integral
input_time_intervals : {‘annual’, ‘monthly’, ‘daily’, ‘#hr’}

A string specifying the time resolution of the input data. In ‘#hr’ above, the ‘#’ stands for a number, e.g. 3hr or 6hr, for sub-daily output. These are the suggested specifiers, but others may be used if they are also used by the DataLoaders for the given Runs.

input_time_datatypes : {‘inst’, ‘ts’, ‘av’}

What the time axis of the input data represents:

  • ‘inst’ : Timeseries of instantaneous values
  • ‘ts’ : Timeseries of averages over the period of each time-index
  • ‘av’ : A single value averaged over a date range
input_vertical_datatypes : {False, ‘pressure’, ‘sigma’}, optional

The vertical coordinate system used by the input data:

  • False : not defined vertically
  • ‘pressure’ : pressure coordinates
  • ‘sigma’ : hybrid sigma-pressure coordinates
input_time_offsets : {None, dict}, optional

How to offset input data in time to correct for metadata errors

exec_options : dict or None (default None)

Options regarding how the calculations are reported, submitted, and saved. If None, default settings are used for all options. Currently supported options (each should be either True or False):

  • prompt_verify : (default False) If True, print summary of
    calculations to be performed and prompt user to confirm before submitting for execution.
  • parallelize : (default False) If True, submit calculations in
    parallel.
  • client : distributed.Client or None (default None) The
    dask.distributed Client used to schedule computations. If None and parallelize is True, a LocalCluster will be started.
  • write_to_tar : (default True) If True, write results of calculations
    to .tar files, one for each aospy.Run object. These tar files have an identical directory structures the standard output relative to their root directory, which is specified via the tar_direc_out argument of each Proj object’s instantiation.
Returns:

A list of the return values from each aospy.Calc.compute() call

If a calculation ran without error, this value is the aospy.Calc object itself, with the results of its calculations saved in its data_out attribute. data_out is a dictionary, with the keys being the temporal-regional reduction identifiers (e.g. ‘reg.av’), and the values being the corresponding result.

If any error occurred during a calculation, the return value is None.

Raises:

AospyException

If the prompt_verify option is set to True and the user does not respond affirmatively to the prompt.

Units and Constants

aospy provides the classes Constant and Units for representing, respectively, physical constants (e.g. Earth’s gravitational acceleration at the surface = 9.81 m/s^2) and physical units (e.g. meters per second squared in that example).

aospy comes with several commonly used constants saved within the constants module in which the Constant class is also defined. In contrast, there are no pre-defined Units objects; the user must define any Units objects they wish to use (e.g. to populate the units attribute of their Var objects).

Warning

Whereas these baked-in Constant objects are used by aospy in various places, aospy currently does not actually use the Var.units attribute during calculations or the Units class more generally; they are solely for the user’s own informational purposes.

constants

Classes and objects pertaining to physical constants.

class aospy.constants.Constant(value, units, description='')[source]

Physical constants used in atmospheric and oceanic sciences.

units

Functionality for representing physical units, e.g. meters.

class aospy.units.Units(units='', plot_units=False, plot_units_conv=1.0, vert_int_units=False, vert_int_plot_units=False, vert_int_plot_units_conv=False)[source]

String representation of physical units and conversion methods.

Note

There has been discussion of implementing units-handling upstream within xarray (see here). If that happens, the Units class will likely be deprecated and replaced with the upstream version.

Utilities

aospy includes a number of utility functions that are used internally and may also be useful to users for their own purposes. These include functions pertaining to input/output (IO), time arrays, andvertical coordinates.

utils.io

Utility functions for data input and output.

aospy.utils.io.data_in_label(intvl_in, dtype_in_time, dtype_in_vert=False)[source]

Create string label specifying the input data of a calculation.

aospy.utils.io.data_name_gfdl(name, domain, data_type, intvl_type, data_yr, intvl, data_in_start_yr, data_in_dur)[source]

Determine the filename of GFDL model data output.

aospy.utils.io.data_out_label(time_intvl, dtype_time, dtype_vert=False)[source]
aospy.utils.io.dmget(files_list)[source]

Call GFDL command ‘dmget’ to access archived files.

aospy.utils.io.ens_label(ens_mem)[source]

Create label of the ensemble member for aospy data I/O.

aospy.utils.io.get_parent_attr(obj, attr, strict=False)[source]

Search recursively through an object and its parent for an attribute.

Check if the object has the given attribute and it is non-empty. If not, check each parent object for the attribute and use the first one found.

aospy.utils.io.time_label(intvl, return_val=True)[source]

Create time interval label for aospy data I/O.

aospy.utils.io.yr_label(yr_range)[source]

Create label of start and end years for aospy data I/O.

utils.times

Utility functions for handling times, dates, etc.

aospy.utils.times.add_uniform_time_weights(ds)[source]

Append uniform time weights to a Dataset.

All DataArrays with a time coordinate require a time weights coordinate. For Datasets read in without a time bounds coordinate or explicit time weights built in, aospy adds uniform time weights at each point in the time coordinate.

Parameters:

ds : Dataset

Input data

Returns:

Dataset

aospy.utils.times.apply_time_offset(time, years=0, months=0, days=0, hours=0)[source]

Apply a specified offset to the given time array.

This is useful for GFDL model output of instantaneous values. For example, 3 hourly data postprocessed to netCDF files spanning 1 year each will actually have time values that are offset by 3 hours, such that the first value is for 1 Jan 03:00 and the last value is 1 Jan 00:00 of the subsequent year. This causes problems in xarray, e.g. when trying to group by month. It is resolved by manually subtracting off those three hours, such that the dates span from 1 Jan 00:00 to 31 Dec 21:00 as desired.

Parameters:

time : xarray.DataArray representing a timeseries

years, months, days, hours : int, optional

The number of years, months, days, and hours, respectively, to offset the time array by. Positive values move the times later.

Returns:

pandas.DatetimeIndex

Examples

Case of a length-1 input time array:

>>> times = xr.DataArray(datetime.datetime(1899, 12, 31, 21))
>>> apply_time_offset(times)
Timestamp('1900-01-01 00:00:00')

Case of input time array with length greater than one:

>>> times = xr.DataArray([datetime.datetime(1899, 12, 31, 21),
...                       datetime.datetime(1899, 1, 31, 21)])
>>> apply_time_offset(times) 
DatetimeIndex(['1900-01-01', '1899-02-01'], dtype='datetime64[ns]',
              freq=None)
aospy.utils.times.assert_matching_time_coord(arr1, arr2)[source]

Check to see if two DataArrays have the same time coordinate.

Parameters:

arr1 : DataArray or Dataset

First DataArray or Dataset

arr2 : DataArray or Dataset

Second DataArray or Dataset

Raises:

ValueError

If the time coordinates are not identical between the two Datasets

aospy.utils.times.average_time_bounds(ds)[source]

Return the average of each set of time bounds in the Dataset.

Useful for creating a new time array to replace the Dataset’s native time array, in the case that the latter matches either the start or end bounds. This can cause errors in grouping (akin to an off-by-one error) if the timesteps span e.g. one full month each. Note that the Dataset’s times must not have already undergone “CF decoding”, wherein they are converted from floats using the ‘units’ attribute into datetime objects.

Parameters:

ds : xarray.Dataset

A Dataset containing a time bounds array with name matching internal_names.TIME_BOUNDS_STR. This time bounds array must have two dimensions, one of which’s coordinates is the Dataset’s time array, and the other is length-2.

Returns:

xarray.DataArray

The mean of the start and end times of each timestep in the original Dataset.

Raises:

ValueError

If the time bounds array doesn’t match the shape specified above.

aospy.utils.times.convert_scalar_to_indexable_coord(scalar_da)[source]

Convert a scalar coordinate to an indexable one.

In xarray, scalar coordinates cannot be indexed. This converts a scalar coordinate-containing DataArray to one that can be indexed using da.sel and da.isel.

Parameters:

scalar_da : DataArray

Must contain a scalar coordinate

Returns:

DataArray

aospy.utils.times.datetime_or_default(date, default)[source]

Return a datetime.datetime object or a default.

Parameters:

date : None or datetime-like object

default : The value to return if date is None

Returns:

default if date is None, otherwise returns the result of

utils.times.ensure_datetime(date)

aospy.utils.times.ensure_datetime(obj)[source]

Return the object if it is of type datetime.datetime; else raise.

Parameters:obj : Object to be tested.
Returns:The original object if it is a datetime.datetime object.
Raises:TypeError if `obj` is not of type `datetime.datetime`.
aospy.utils.times.ensure_time_as_dim(ds)[source]

Ensures that time is an indexable dimension on relevant quantites

In xarray, scalar coordinates cannot be indexed. We rely on indexing in the time dimension throughout the code; therefore we need this helper method to (if needed) convert a scalar time coordinate to a dimension.

Note that this must be applied before CF-conventions are decoded; otherwise it casts np.datetime64[ns] as int values.

Parameters:

ds : Dataset

Dataset with a time coordinate

Returns:

Dataset

aospy.utils.times.ensure_time_avg_has_cf_metadata(ds)[source]

Add time interval length and bounds coordinates for time avg data.

If the Dataset or DataArray contains time average data, enforce that there are coordinates that track the lower and upper bounds of the time intervals, and that there is a coordinate that tracks the amount of time per time average interval.

CF conventions require that a quantity stored as time averages over time intervals must have time and time_bounds coordinates [R1]. aospy further requires AVERAGE_DT for time average data, for accurate time-weighted averages, which can be inferred from the CF-required time_bounds coordinate if needed. This step should be done prior to decoding CF metadata with xarray to ensure proper computed timedeltas for different calendar types.

[R1]http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_data_representative_of_cells
Parameters:

ds : Dataset or DataArray

Input data

Returns:

Dataset or DataArray

Time average metadata attributes added if needed.

aospy.utils.times.extract_months(time, months)[source]

Extract times within specified months of the year.

Parameters:

time : xarray.DataArray

Array of times that can be represented by numpy.datetime64 objects (i.e. the year is between 1678 and 2262).

months : Desired months of the year to include

Returns:

xarray.DataArray of the desired times

aospy.utils.times.month_indices(months)[source]

Convert string labels for months to integer indices.

Parameters:

months : str, int

If int, number of the desired month, where January=1, February=2, etc. If str, must match either ‘ann’ or some subset of ‘jfmamjjasond’. If ‘ann’, use all months. Otherwise, use the specified months.

Returns:

np.ndarray of integers corresponding to desired month indices

Raises:

TypeError : If months is not an int or str

See also

_month_conditional

aospy.utils.times.monthly_mean_at_each_ind(monthly_means, sub_monthly_timeseries)[source]

Copy monthly mean over each time index in that month.

Parameters:

monthly_means : xarray.DataArray

array of monthly means

sub_monthly_timeseries : xarray.DataArray

array of a timeseries at sub-monthly time resolution

Returns:

xarray.DataArray with eath monthly mean value from monthly_means repeated

at each time within that month from sub_monthly_timeseries

See also

monthly_mean_ts
Create timeseries of monthly mean values
aospy.utils.times.monthly_mean_ts(arr)[source]

Convert a sub-monthly time-series into one of monthly means.

Also drops any months with no data in the original DataArray.

Parameters:

arr : xarray.DataArray

Timeseries of sub-monthly temporal resolution data

Returns:

xarray.DataArray

Array resampled to comprise monthly means

See also

monthly_mean_at_each_ind
Copy monthly means to each submonthly time
aospy.utils.times.numpy_datetime_range_workaround(date, min_year, max_year)[source]

Reset a date to earliest allowable year if outside of valid range.

Hack to address np.datetime64, and therefore pandas and xarray, not supporting dates outside the range 1677-09-21 and 2262-04-11 due to nanosecond precision. See e.g. https://github.com/spencerahill/aospy/issues/96.

Parameters:

date : datetime.datetime object

min_year : int

Minimum year in the raw decoded dataset

max_year : int

Maximum year in the raw decoded dataset

Returns:

datetime.datetime object

Original datetime.datetime object if the original date is within the permissible dates, otherwise a datetime.datetime object with the year offset to the earliest allowable year.

aospy.utils.times.numpy_datetime_workaround_encode_cf(ds)[source]

Generate CF-compliant units for out-of-range dates.

Hack to address np.datetime64, and therefore pandas and xarray, not supporting dates outside the range 1677-09-21 and 2262-04-11 due to nanosecond precision. See e.g. https://github.com/spencerahill/aospy/issues/96.

Specifically, we coerce the data such that, when decoded, the earliest value starts in 1678 but with its month, day, and shorter timescales (hours, minutes, seconds, etc.) intact and with the time-spacing between values intact.

Parameters:

ds : xarray.Dataset

Returns:

xarray.Dataset, int, int

Dataset with time units adjusted as needed, minimum year in loaded data, and maximum year in loaded data.

aospy.utils.times.sel_time(da, start_date, end_date)[source]

Subset a DataArray or Dataset for a given date range.

Ensures that data are present for full extent of requested range. Appends start and end date of the subset to the DataArray.

Parameters:

da : DataArray or Dataset

data to subset

start_date : np.datetime64

start of date interval

end_date : np.datetime64

end of date interval

Returns:

da : DataArray or Dataset

subsetted data

Raises:

AssertionError

if data for requested range do not exist for part or all of requested range

aospy.utils.times.yearly_average(arr, dt)[source]

Average a sub-yearly time-series over each year.

Resulting timeseries comprises one value for each year in which the original array had valid data. Accounts for (i.e. ignores) masked values in original data when computing the annual averages.

Parameters:

arr : xarray.DataArray

The array to be averaged

dt : xarray.DataArray

Array of the duration of each timestep

Returns:

xarray.DataArray

Has the same shape and mask as the original arr, except for the time dimension, which is truncated to one value for each year that arr spanned

utils.vertcoord

Utility functions for dealing with vertical coordinates.

aospy.utils.vertcoord.d_deta_from_pfull(arr)[source]

Compute $partial/partialeta$ of the array on full hybrid levels.

$eta$ is the model vertical coordinate, and its value is assumed to simply increment by 1 from 0 at the surface upwards. The data to be differenced is assumed to be defined at full pressure levels.

Parameters:

arr : xarray.DataArray containing the ‘pfull’ dim

Returns:

deriv : xarray.DataArray with the derivative along ‘pfull’ computed via

2nd order centered differencing.

aospy.utils.vertcoord.d_deta_from_phalf(arr, pfull_coord)[source]

Compute pressure level thickness from half level pressures.

aospy.utils.vertcoord.does_coord_increase_w_index(arr)[source]

Determine if the array values increase with the index.

Useful, e.g., for pressure, which sometimes is indexed surface to TOA and sometimes the opposite.

aospy.utils.vertcoord.dp_from_p(p, ps, p_top=0.0, p_bot=110000.0)[source]

Get level thickness of pressure data, incorporating surface pressure.

Level edges are defined as halfway between the levels, as well as the user- specified uppermost and lowermost values. The dp of levels whose bottom pressure is less than the surface pressure is not changed by ps, since they don’t intersect the surface. If ps is in between a level’s top and bottom pressures, then its dp becomes the pressure difference between its top and ps. If ps is less than a level’s top and bottom pressures, then that level is underground and its values are masked.

Note that postprocessing routines (e.g. at GFDL) typically mask out data wherever the surface pressure is less than the level’s given value, not the level’s upper edge. This masks out more levels than the

aospy.utils.vertcoord.dp_from_ps(bk, pk, ps, pfull_coord)[source]

Compute pressure level thickness from surface pressure

aospy.utils.vertcoord.get_dim_name(arr, names)[source]

Determine if an object has an attribute name matching a given list.

aospy.utils.vertcoord.int_dp_g(arr, dp)[source]

Mass weighted integral.

aospy.utils.vertcoord.integrate(arr, ddim, dim=False, is_pressure=False)[source]

Integrate along the given dimension.

aospy.utils.vertcoord.level_thickness(p, p_top=0.0, p_bot=101325.0)[source]

Calculates the thickness, in Pa, of each pressure level.

Assumes that the pressure values given are at the center of that model level, except for the lowest value (typically 1000 hPa), which is the bottom boundary. The uppermost level extends to 0 hPa.

Unlike dp_from_p, this does not incorporate the surface pressure.

aospy.utils.vertcoord.pfull_from_ps(bk, pk, ps, pfull_coord)[source]

Compute pressure at full levels from surface pressure.

aospy.utils.vertcoord.phalf_from_ps(bk, pk, ps)[source]

Compute pressure of half levels of hybrid sigma-pressure coordinates.

aospy.utils.vertcoord.replace_coord(arr, old_dim, new_dim, new_coord)[source]

Replace a coordinate with new one; new and old must have same shape.

aospy.utils.vertcoord.to_hpa(arr)[source]

Convert pressure array from Pa to hPa (if needed).

aospy.utils.vertcoord.to_pascal(arr, is_dp=False)[source]

Force data with units either hPa or Pa to be in Pa.

aospy.utils.vertcoord.to_pfull_from_phalf(arr, pfull_coord)[source]

Compute data at full pressure levels from values at half levels.

aospy.utils.vertcoord.to_phalf_from_pfull(arr, val_toa=0, val_sfc=0)[source]

Compute data at half pressure levels from values at full levels.

Could be the pressure array itself, but it could also be any other data defined at pressure levels. Requires specification of values at surface and top of atmosphere.

aospy.utils.vertcoord.to_radians(arr, is_delta=False)[source]

Force data with units either degrees or radians to be radians.

aospy.utils.vertcoord.vert_coord_name(arr)[source]

See also

  • Spencer Hill’s talk on aospy (slides, recorded talk) at the Seventh Symposium on Advances in Modeling and Analysis Using Python, recorded 2017 January 24 as part of the 2017 American Meteorological Society Annual Meeting.
  • Our guest post, “What’s needed for the future of AOS python? Tools for automating AOS data analysis and management” on the PyAOS blog.
  • Our shout-out in Katy Huff’s PyCon 2017 keynote talk (scroll to 23:35 for the bit about us)
  • The xarray package, upon which aospy relies heavily.

Get in touch

  • Troubleshooting: We are actively seeking new users and are eager to help you get started with aospy! Usage questions, bug reports, and any other correspondence are all welcome and best placed as Issues on our Github repo.
  • Our mailing list: join it! Questions, bug reports, and comments are welcome there also.
  • Contributing: We are also actively seeking new developers! Please get in touch by opening an Issue or submitting a Pull Request.

License

aospy is freely available under the open source Apache License.

History

aospy was originally created by Spencer Hill as a means of automating calculations required for his Ph.D. work across many climate models and simulations. Starting in 2014, Spencer Clark became aospy’s second user and developer. The first official release on PyPI was v0.1 on January 24, 2017.