aospy: automated climate data analysis and management¶
aospy (pronounced A - O - S - py) is an open source Python package for automating your computations that use gridded climate and weather data (namely data stored as netCDF files) and the management of the results of those computations.
aospy enables firing off multiple calculations in parallel using the permutation of an arbitrary number of climate models, simulations, variables to be computed, date ranges, sub-annual-sampling, and many other parameters. In other words, it is possible using aospy to submit and execute all calculations for a particular project (e.g. paper, class project, or thesis chapter) with a single command!
The results get saved in a highly organized directory tree as netCDF files, making it easy to subsequently find and use the data (e.g. for plotting) and preventing “orphan” files with vague filenames and insufficient metadata to remember what they are and/or how they were computed.
The eventual goal is for aospy to become the community standard for gridded climate data analysis and, in so doing, accelerate progress in climate science and make the results of climate research more easily reproducible and shareable.
Documentation¶
What’s New¶
v0.2 (26 September 2017)¶
This release includes some new features plus several bugfixes. The bugfixes include some that previously made using aospy on pressure-interpolated data very problematic. We have also improved support for reading in data from the WRF and CAM atmospheric models.
As of this release, aospy has at least 2(!) confirmed regular users that aren’t the original aospy developers, bringing the worldwide total of users up to at least 4. The first user-generated Github Issues have now also been created. We’re a real thing!
Enhancements¶
- Use
dask.bag
coupled withdask.distributed
rather thanmultiprocess
to parallelize computations (closes GH169 via PR172). This enables the optional use of an externaldistributed.Client
to leverage computational resources across multiple nodes of a cluster. By Spencer Clark. - Improve support for WRF and NCAR CAM model data by adding the internal names they use for grid attributes to aospy’s lists of potential names to search for. By Spencer Hill.
- Allow a user to specify a custom preprocessing function in all DataLoaders to prepare data for processing with aospy. This could be used, for example, to add a CF-compliant units attribute to the time coordinate if it is not present in a set of files. Addresses GH177 via PR180. By Spencer Clark.
- Remove
dask.async
import inmodel.py
; no longer needed, and also prevents warning message from dask regarding location ofget_sync
function (PR195). By Spencer Hill.
Dependencies¶
Bug Fixes¶
- Remove faulty logic for calculations with data coming from multiple runs. Eventually this feature will be properly implemented (fixes GH117 via PR178). By Spencer Hill.
- Only run tests that require optional dependencies if those dependencies are actually installed (fixes GH167 via PR176). By Spencer Hill.
- Remove obsolete
operator.py
module (fixes GH174 via PR175). By Spencer Clark. - Fix workaround for dates with years less than 1678 to support units attributes with a reference date years not equal to 0001 (fixes GH188 via PR189). By Spencer Clark.
- Fix bug which would prevent users from analyzing a subset within the Timestamp-valid range from a dataset which included data from outside the Timestamp-valid range (fixed in PR189). By Spencer Clark.
- Toggle the
mask_and_scale
option toTrue
when reading in netCDF files to enable missing values encoded as floats to be converted to NaN’s (fixes GH190 via PR192). By Spencer Clark. - Force regional calculations to mask gridcell weights where the loaded datapoints were invalid instead of just masking points outside the desired region (fixes GH190 via PR192). By Spencer Clark.
- Retain original input data’s mask during gridpoint-by-gridpoint temporal averages (fixes GH193 via PR196). By Spencer Hill.
- Always write output to a tar file in serial to prevent empty header file errors (fixes GH75 via PR197). By Spencer Clark.
- Allow
aospy
to use grid attributes that are only defined inRun
objects. Previously if a grid attribute were defined only in aRun
object and not also in the Run’s correspondingModel
, an error would be raised (fixes GH187 via PR199). By Spencer Clark. - When input data for a calculation has a time bounds array, overwrite its time array with the average of the start and end times for each timestep. Prevents bug wherein time arrays equal to either the start or end bounds get mistakenly grouped into the wrong time interval, i.e. the wrong month or year (fixes :issue 185 via PR200). By Spencer Hill.
v0.1.2 (30 March 2017)¶
This release improves the process of submitting multiple calculations for automatic execution. The user interface, documentation, internal logic, and packaging all received upgrades and/or bugfixes.
We also now have a mailing list. Join it to follow and/or post your own usage questions, bug reports, suggestions, etc.
Enhancements¶
- Include an example library of aospy objects that works out-of-the-box with the provided example main script (PR155). By Spencer Clark and Spencer Hill.
- Improve Examples page of the documentation by using this new example object library (PR164). By Spencer Hill.
- Improve readability/usability of the included example script
aospy_main.py
for submitting aospy calculations by moving all internal logic into newautomate.py
module (PR155). By Spencer Clark and Spencer Hill. - Enable user to specify whether or not to write output to .tar files (in addition to the standard output). Also document an error that occurs when writing output to .tar files for sufficiently old versions of tar (including the version that ships standard on MacOS), and print a warning when errors are caught during the ‘tar’ call (PR160). By Spencer Hill.
Bug fixes¶
- Update packaging specifications such that the example main script and tutorial notebook actually ship with aospy as intended (fixes GH149 via PR161). By Spencer Hill.
- Use the ‘scipy’ engine for the xarray.DataArray.to_netcdf call when writing aospy calculation outputs to disk to prevent a bug when trying to re-write to an existing netCDF file (fixes GH157 via PR160). By Spencer Hill.
v0.1.1 (2 March 2017)¶
This release includes fixes for a number of bugs mistakenly introduced
in the refactoring of the variable loading step of calc.py
(PR90), as well as support for xarray version 0.9.1.
Enhancements¶
- Support for xarray version 0.9.1 and require it or a later xarray version. By Spencer Clark and Spencer Hill.
- Better support for variable names relating to “bounds” dimension of input data files. “bnds”, “bounds”, and “nv” now all supported (PR140). By Spencer Hill.
- When coercing dims of input data to aospy’s internal names, for scalars change only the name; for non-scalars change the name, force them to have a coord, and copy over their attrs (PR140). By Spencer Hill.
Bug fixes¶
- Fix bug involving loading data that has dims that lack coords (which is possible as of xarray v0.9.0). By Spencer Hill.
- Fix an instance where the name for pressure half levels was mistakenly replaced with the name for the pressure full levels (PR126). By Spencer Clark.
- Prevent workaround for dates outside the
pd.Timestamp
valid range from being applied to dates within thepd.Timestamp
valid range (PR128). By Spencer Clark. - Ensure that all DataArrays associated with
aospy.Var
objects have a time weights coordinate with CF-compliant time units. This allows them to be cast as the typenp.timedelta64
, and be safely converted to have units of days before taking time-weighted averages (PR128). By Spencer Clark. - Fix a bug where the time weights were not subset in time prior to
taking a time weighted average; this caused computed seasonal
averages to be too small. To prevent this from failing silently
again, we now raise a
ValueError
if the time coordinate of the time weights is not identical to the time coordinate of the array associated with theaospy.Var
(PR128). By Spencer Clark. - Enable calculations to be completed using data saved as a single time-slice on disk (fixes GH132 through PR135). By Spencer Clark.
- Fix bug where workaround for dates outside the
pd.Timestamp
valid range caused a mismatch between the data loaded and the data requested (fixes GH138 through PR139). By Spencer Clark.
v0.1 (24 January 2017)¶
- Initial release!
- Contributors:
Overview: Why aospy?¶
Use cases¶
If you’ve ever found yourself saying or thinking anything along these lines, then aospy may be a great tool for you:
- “What is this precip01.dat file in my home directory that I vaguely remember creating a few weeks ago? Better just delete it and re-do the calculation to be sure.”
- “I really need to analyze variable X. But the models/observational products I’m using don’t provide it. They do provide variables Y and Z, from which I can compute quantity X.”
- “I need to calculate quantity X from 4 different simulations repeated in 5 different models; computing monthly, seasonal, and annual averages and standard deviations; gridpoint-by-gridpoint and averaged over these 10 regions. That’s impractical – I’ll just do a small subset.”
Each of these common problems is easily solved by using aospy.
Design philosophy¶
aospy’s ability to automate calculations (while properly handling model- and simulation-level idiosyncrasies) relies on separating your code into three distinct categories.
- Code characterizing the data you want to work with: “Where is your data and what does it represent?”
- Code describing abstract physical quantities and geographical regions you eventually want to examine: “What things are you generally interested in calculating?”
- Code specifying the exact parameters of calculations you want to perform right now: “Ok, time to actually crunch some numbers. Exactly what all do you want to compute from your data, and how do you want to slice and dice it?”
How you’ll actually interact with aospy in order to achieve each of these three steps is described in the Using aospy section of this documentation, and explicit examples using included sample data are in the Examples section.
Open Science & Reproducible Research¶
This separation of your code into three categories promotes open science and reproducible research in multiple ways:
- By separating code that describes where your data is from the code used to compute particular physical quantities, the latter can be written in a generic form that closely mimics the physical form of the particular expression. The resulting code is easier to read and debug and therefore to share with others.
- By enabling automation of calculations across an arbitrary number of parameter combinations, aospy facilitates more rigorous analyses and/or analyses encompassing a larger span of input data than would otherwise be possible.
- By outputting the results of calculations as netCDF files in a highly organized directory structure with lots of metadata embued within the file path, file name, and the netCDF file itself, aospy facilitates the sharing of results with others (including your future self that has forgotten the myriad details of how you have computed things right now).
Using aospy¶
This section provides a high-level summary of how to use aospy. See the Overview section of this documentation for more background information, or the Examples section for concrete examples.
Your aospy object library¶
The first step is writing the code that describes your data and the quantities you eventually want to compute using it. We refer to this code collectively as your “object library”.
Describing your data on disk¶
aospy needs to know where the data you want to use is located on disk
and how it is organized across different projects, models, and model
runs (i.e. simulations). This involves a hierarchy of three classes,
aospy.Proj
, aospy.Model
, and
aospy.Run
.
aospy.Proj
: This represents a single project that involves analysis of data from one or more models and simulations.aospy.Model
: This represents a single climate model, other numerical model, observational data source, etc.aospy.Run
: This represents a single simulation, version of observational data, etc.
So each user’s object library will contain one or more
aospy.Proj
objects, each of which will have one or more
child aospy.Model
objects, which in turn will each have
one or more child aospy.Run
objects.
Note
Currently, the Proj-Model-Run hierarchy is rigid, in that each Run has a parent Model, and each Model has a parent Proj. Work is ongoing to relax this to a more generic parent-child framework.
Physical variables¶
The aospy.Var
class is used to represent physical variables,
e.g. precipitation rate or potential temperature. This includes both
variables which are directly available in netCDF files (e.g. they were
directly outputted by your model or gridded data product) as well as
those fields that must be computed from other variables (e.g. they
weren’t directly outputted but can be computed from other variables
that were outputted).
Geographical regions¶
The aospy.Region
class is used to define geographical
regions over which quantities can be averaged (in addition to
gridpoint-by-gridpoint values). Like aospy.Var
objects,
they are more generic than the objects of the aospy.Proj
-
aospy.Model
- aospy.Run
hierarchy, in that
they correspond to the generic physical quantities/regions rather than
the data of a particular project, model, or simulation.
Object library structure¶
The officially supported way to submit calculations is the
aospy.submit_mult_calcs()
function. In order for this to
work, your object library must follow one or the other of these
structures:
- All
aospy.Proj
andaospy.Var
objects are accessible as attributes of your library. This means thatmy_obj_lib.my_obj
works, wheremy_obj_lib
is your object library, andmy_obj
is the object in question. - All
aospy.Proj
objects are stored in a container calledprojs
, whereprojs
is an attribute of your library (i.e.my_obj_lib.projs
). And likewise foraospy.Var
objects in avariables
attribute.
Beyond that, you can structure your object library however you wish. In particular, it can be structured as a Python module (i.e. a single “.py” file) or as a package (i.e. multiple “.py” files linked together; see the official documentation on package structuring).
A single module works great for small projects and for initially
trying out aospy (this is how the example object library,
aospy.examples.example_obj_lib
, is structured). But as
your object library grows, it can become easier to manage as a package
of multiple files. For an example of a large object library that is
structured as a formal package, see here.
Accessing your library¶
If your current working directory is the one containing your library,
you can import your library via import my_obj_lib
(replacing
my_obj_lib
with whatever you’ve named yours) in order to pass it
to aospy.submit_mult_calcs()
.
Once you start using aospy a lot, however, this requirement of being
in the same directory becomes cumbersome. As a solution, you can add
the directory containing your object library to the PYTHONPATH
environment variable. E.g if you’re using the bash shell:
export PYTHONPATH=/path/to/your/object/library:${PYTHONPATH}
Of course, replace /path/to/your/object/library
with the actual
path to yours. This command places your object library at the front
of the PYTHONPATH
environment variable, which is essentially the
first place where Python looks to find packages and modules to be
imported. (For more, see Python’s official documentation on
PYTHONPATH).
Note
It’s convenient to copy this command into your shell profile (e.g.,
for the bash shell on Linux or Mac, ~/.bash_profile
) so that
you don’t have to call it again in every new terminal session.
To test this is working, run python -c "import my_obj_lib"
from a
directory other than where the library is located (again replacing
my_obj_lib
with the name you’ve given to your library). If this
runs without error, you should be good to go.
Executing calculations¶
As noted above, the officially supported way to submit calculations is the
aospy.submit_mult_calcs()
function.
We provide a template “main” script with aospy that uses this function. We recommend copying it to the location of your choice. In the copy, replace the example object library and associated objects with your own. (If you accidentally change the original, you can always get a fresh copy from Github).
Running the main script¶
Once the main script parameters are all modified as desired, execute the script from the command line as follows
/path/to/your/aospy_main.py
This should generate a text summary of the specified parameters and a prompt as to whether to proceed or not with the calculations. An affirmative response then triggers the calculations to execute.
Note
You may need to change the permissions on the file to make it executable. E.g. from a Mac or Linux: chmod u+x /path/to/your/aospy_main.py. Alternatively you can call python or IPython from the command line to run it: python /path/to/your/aospy_main.py or ipython /path/to/your/aospy_main.py.
Specifically, the parameters are permuted over all possible combinations. So, for example, if two model names and three variable names were listed and all other parameters had only one element, six calculations would be generated and executed. There is no limit to the number of permutations.
Note
You can also call the main script interactively within an IPython
session via %run /path/to/your/main.py
or, from the command
line, run the script and then start an interactive IPython session
via ipython -i /path/to/your/main.py
.
Or you can call aospy.submit_mult_calcs()
directly within
an interactive session.
As the calculations are performed, logging information will be printed to the terminal displaying their progress.
Parallelized calculations¶
The calculations generated by the main script can be executed in
parallel using dask.distributed
. aospy will either automatically
set up a dask.distributed.LocalCluster
to perform the calculations,
or one can optionally specify an external distributed.Client
to delegate
the work. Otherwise, or if the user sets parallelize=False
in the
calc_exec_options
argument of aospy.submit_mult_calcs()
,
script, the calculations will be executed one-by-one.
Particularly on instititutional clusters with many cores, this parallelization yields an impressive speed-up when multiple calculations are generated.
Note
When calculations are performed in parallel, often the logging information from different calculations running simultameously end up interwoven with one another, leading to output that is confusing to follow. Work is ongoing to improve the logging output when the computations are parallelized.
Finding the output¶
aospy saves the results of all calculations as netCDF files and embeds metadata describing it within the netCDF files, in their filenames, and in the directory structure within which they are saved.
- Directory structure:
/path/to/aospy-rootdir/projname/modelname/runname/varname
- File name :
varname.intvl_out.dtype_out_time.'from_'intvl_in'_'dtype_in_time.model.run.date_range.nc
See the API Reference on aospy.CalcInterface
for
explanation of each of these components of the path and file name.
Under the hood¶
aospy.submit_mult_calcs()
creates a aospy.CalcSuite
object that permutes over the provided lists of calculation
specifications, encoding each permutation into a
aospy.CalcInterface
object.
Note
Actually, when multiple regions and/or output time/regional
reductions are specified, these all get passed to each
aospy.CalcInterface
object rather than being permuted
over. They are then looped over during the subsequent
calculations. This is to prevent unnecessary re-loading and
re-computing, because, for a given simulation/variable/etc., all
regions and reduction methods use the same data.
Each aospy.CalcInterface
object, in turn, is used to
instantiate a aospy.Calc
object. The
aospy.Calc
object, in turn:
- loads the required netCDF data given its simulation, variable, and date range
- (if necessary) further truncates the data in time (i.e. to the given subset of the annual cycle, and/or if the requested date range doesn’t exactly align with the time chunking of the input netCDF files)
- (if the variable is a function of other variables) executes the function that computes the calculation using this loaded and truncated data
- applies all specified temporal and regional time reductions
- writes the results (plus additional metadata) to disk as netCDF
files and appends it to its own
data_out
attribute
Note
Unlike aospy.Proj
, aospy.Model
,
aospy.Run
, aospy.Var
, and
aospy.Region
, these objects are not intended to be
saved in .py
files for continual re-use. Instead, they are
generated as needed, perform their desired tasks, and then go away.
See the API reference documentation for further details.
Examples¶
Note
The footnotes in this section provide scientific background to help you understand the motivation and physical meaning of these example calculations. They can be skipped if you are familiar already or aren’t interested in those details.
In this section, we use the example data files included with aospy to demonstrate the standard aospy workflow of executing and submitting multiple calculations at once.
These files contain timeseries of monthly averages of two variables generated by an idealized [1] aquaplanet [2] climate model:
- Precipitation generated through gridbox-scale condensation
- Precipitation generated through the model’s convective parameterization [3]
Using this data that was directly outputted by the model, let’s compute two other useful quantities: (1) the total precipitation rate, and (2) the fraction of the total precipitation rate that comes from the convective parameterization. We’ll compute the time-average over the whole duration of the data, both at each gridpoint and aggregated over specific regions.
Preliminaries¶
First we’ll save the path to the example data in a local variable, since we’ll be using it in several places below.
In [1]: import os # Python built-in package for working with the operating system
In [2]: import aospy
In [3]: rootdir = os.path.join(aospy.__path__[0], 'test', 'data', 'netcdf')
Now we’ll use the fantastic xarray package to inspect the data:
In [4]: import xarray as xr
In [5]: xr.open_mfdataset(os.path.join(rootdir, '000[4-6]0101.precip_monthly.nc'),
...: decode_times=False)
...:
Out[5]:
<xarray.Dataset>
Dimensions: (lat: 64, latb: 65, lon: 128, lonb: 129, nv: 2, time: 36)
Coordinates:
* lonb (lonb) float64 -1.406 1.406 4.219 7.031 9.844 12.66 ...
* lon (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 16.88 ...
* latb (latb) float64 -90.0 -86.58 -83.76 -80.96 -78.16 ...
* lat (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 ...
* nv (nv) float64 1.0 2.0
* time (time) float64 1.111e+03 1.139e+03 1.17e+03 1.2e+03 ...
Data variables:
condensation_rain (time, lat, lon) float64 5.768e-06 5.784e-06 ...
convection_rain (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
time_bounds (time, nv) float64 1.095e+03 1.126e+03 1.126e+03 ...
average_DT (time) float64 31.0 28.0 31.0 30.0 31.0 30.0 31.0 ...
zsurf (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
We see that, in this particular model, the variable names for these two forms of precipitation are “condensation_rain” and “convection_rain”, respectively. The file also includes the coordinate arrays (“lat”, “time”, etc.) that indicate where in space and time the data refers to.
Now that we know where and what the data is, we’ll proceed through the workflow described in the Using aospy section of this documentation.
Describing your data¶
Runs and DataLoaders¶
First we create an aospy.Run
object that stores metadata
about this simulation. This includes specifying where its files are
located via an aospy.data_loader.DataLoader
object.
DataLoaders specify where your data is located and organized. Several types of DataLoaders exist, each for a different directory and file structure; see the API Reference for details.
For our simple case, where the data comprises a single file, the
simplest DataLoader, a DictDataLoader
, works well. It maps your
data based on the time frequency of its output (e.g. 6 hourly, daily,
monthly, annual) to the corresponding netCDF files via a simple
dictionary:
In [6]: from aospy.data_loader import DictDataLoader
In [7]: file_map = {'monthly': os.path.join(rootdir, '000[4-6]0101.precip_monthly.nc')}
In [8]: data_loader = DictDataLoader(file_map)
We then pass this to the aospy.Run
constructor, along with
a name for the run and an optional description.
In [9]: from aospy import Run
In [10]: example_run = Run(
....: name='example_run',
....: description='Control simulation of the idealized moist model',
....: data_loader=data_loader
....: )
....:
Note
See the API reference for other optional arguments for this and the other core aospy objects used in this tutorial.
Models¶
Next, we create the aospy.Model
object that describes the
model in which the simulation was executed. One important attribute
is grid_file_paths
, which consists of a sequence (e.g. a tuple or
list) of paths to netCDF files from which physical attributes of that
model can be found that aren’t already embedded in the output netCDF
files.
For example, often the land mask that defines which gridpoints are
ocean or land is outputted to a single, standalone netCDF file, rather
than being included in the other output files. But we often need the
land mask, e.g. to define certain land-only or ocean-only
regions. [4] This and other grid-related properties shared
across all of a Model’s simulations can be found in one or more of the
files in grid_file_paths
.
The other important attribute is runs
, which is a list of the
aospy.Run
objects that pertain to simulations performed in
this particular model.
In [11]: from aospy import Model
In [12]: example_model = Model(
....: name='example_model',
....: grid_file_paths=(
....: os.path.join(rootdir, '00040101.precip_monthly.nc'),
....: os.path.join(rootdir, 'im.landmask.nc')
....: ),
....: runs=[example_run] # only one Run in our case, but could be more
....: )
....:
Projects¶
Finally, we associate the Model
object with an
aospy.Proj
object. This is the level at which we specify
the directories to which aospy output gets written.
In [13]: from aospy import Proj
In [14]: example_proj = Proj(
....: 'example_proj',
....: direc_out='example-output', # default, netCDF output (always on)
....: tar_direc_out='example-tar-output', # output to .tar files (optional)
....: models=[example_model] # only one Model in our case, but could be more
....: )
....:
This extra aospy.Proj
level of organization may seem like
overkill for this simple example, but it really comes in handy once
you start using aospy for more than one project.
Defining physical quantities and regions¶
Having now fully specified the particular data of interest, we now define the general physical quantities of interest and any geographic regions over which to aggregate results.
Physical variables¶
We’ll first define aospy.Var
objects for the two variables
that we saw are directly available as model output:
In [15]: from aospy import Var
In [16]: precip_largescale = Var(
....: name='precip_largescale', # name used by aospy
....: alt_names=('condensation_rain',), # its possible name(s) in your data
....: def_time=True, # whether or not it is defined in time
....: description='Precipitation generated via grid-scale condensation',
....: )
....: precip_convective = Var(
....: name='precip_convective',
....: alt_names=('convection_rain', 'prec_conv'),
....: def_time=True,
....: description='Precipitation generated by convective parameterization',
....: )
....:
When it comes time to load data corresponding to either of these from
one or more particular netCDF files, aospy will search for variables
matching either name
or any of the names in alt_names
,
stopping at the first successful one. This makes the common problem
of model-specific variable names a breeze: if you end up with data
with a new name for your variable, just add it to alt_names
.
Warning
This assumes that the name and all alternate names are unique to that variable, i.e. that in none of your data do those names actually signify something else. If that was indeed the case, aospy can potentially grab the wrong data without issuing an error message or warning.
Next, we’ll create functions that compute the total precipitation and
convective precipitation fraction and combine them with the above
aospy.Var
objects to define the new aospy.Var
objects:
In [18]: def total_precip(condensation_rain, convection_rain):
....: """Sum of large-scale and convective precipitation."""
....: return condensation_rain + convection_rain
....:
In [19]: def conv_precip_frac(precip_largescale, precip_convective):
....: """Fraction of total precip that is from convection parameterization."""
....: total = total_precip(precip_largescale, precip_convective)
....: return precip_convective / total.where(total)
....:
In [20]: precip_total = Var(
....: name='precip_total',
....: def_time=True,
....: func=total_precip,
....: variables=(precip_largescale, precip_convective),
....: )
....:
In [21]: precip_conv_frac = Var(
....: name='precip_conv_frac',
....: def_time=True,
....: func=conv_precip_frac,
....: variables=(precip_largescale, precip_convective),
....: )
....:
Notice the func
and variables
attributes that weren’t in the
prior Var
constuctors. These signify the function to use and the
physical quantities to pass to that function in order to compute the
quantity.
Note
Although variables
is passed a tuple of Var
objects
corresponding to the physical quantities passed to func
,
func
should be a function whose arguments are the
xarray.DataArray
objects corresponding to those
variables. aospy uses the Var
objects to load the DataArrays
and then passes them to the function.
This enables you to write simple, expressive functions comprising only the physical operations to perform (since the “data wrangling” part has been handled already).
Warning
Order matters in the tuple of aospy.Var
objects passed
to the variables
attribute: it must match the order of the call
signature of the function passed to func
.
E.g. in precip_conv_frac
above, if we had mistakenly done
variables=(precip_convective, precip_largescale)
, the
calculation would execute without error, but all of the results
would be physically wrong.
Geographic regions¶
Last, we define the geographic regions over which to perform
aggregations and add them to example_proj
. We’ll look at the
whole globe and at the Tropics:
In [22]: from aospy import Region
In [23]: globe = Region(
....: name='globe',
....: description='Entire globe',
....: lat_bounds=(-90, 90),
....: lon_bounds=(0, 360),
....: do_land_mask=False
....: )
....:
In [24]: tropics = Region(
....: name='tropics',
....: description='Global tropics, defined as 30S-30N',
....: lat_bounds=(-30, 30),
....: lon_bounds=(0, 360),
....: do_land_mask=False
....: )
....:
In [25]: example_proj.regions = [globe, tropics]
We now have all of the needed metadata in place. So let’s start crunching numbers!
Submitting calculations¶
Using aospy.submit_mult_calcs()
¶
Having put in the legwork above of describing our data and the
physical quantities we wish to compute, we can submit our desired
calculations for execution using aospy.submit_mult_calcs()
.
Its sole required argument is a dictionary specifying all of the
desired parameter combinations.
In the example below, we import and use the example_obj_lib
module
that is included with aospy and whose objects are essentially
identical to the ones we’ve defined above.
In [26]: from aospy.examples import example_obj_lib as lib
In [27]: calc_suite_specs = dict(
....: library=lib,
....: projects=[lib.example_proj],
....: models=[lib.example_model],
....: runs=[lib.example_run],
....: variables=[lib.precip_largescale, lib.precip_convective,
....: lib.precip_total, lib.precip_conv_frac],
....: regions='all',
....: date_ranges='default',
....: output_time_intervals=['ann'],
....: output_time_regional_reductions=['av', 'reg.av'],
....: output_vertical_reductions=[None],
....: input_time_intervals=['monthly'],
....: input_time_datatypes=['ts'],
....: input_time_offsets=[None],
....: input_vertical_datatypes=[False],
....: )
....:
See the API Reference on aospy.submit_mult_calcs()
for more
on calc_suite_specs
, including accepted values for each key.
submit_mult_calcs()
also accepts a second dictionary
specifying some options regarding how we want aospy to display,
execute, and save our calculations. For the sake of this simple
demonstration, we’ll suppress the prompt to confirm the calculations,
submit them in serial rather than parallel, and suppress writing
backup output to .tar files:
In [28]: calc_exec_options = dict(prompt_verify=False, parallelize=False,
....: write_to_tar=False)
....:
Now let’s submit this for execution:
In [29]: from aospy import submit_mult_calcs
In [30]: calcs = submit_mult_calcs(calc_suite_specs, calc_exec_options)
This permutes over all of the parameter settings in
calc_suite_specs
, generating and executing the resulting
calculation. In this case, it will compute all four variables and
perform annual averages, both for each gridpoint and regionally
averaged.
Although we do not show it here, this also prints logging information to the terminal at various steps during each calculation, including the filepaths to the netCDF files written to disk of the results.
Results¶
The result is a list of aospy.Calc
objects, one per
simulation.
In [31]: calcs
Out[31]:
[<aospy.Calc instance: precip_largescale, example_proj, example_model, example_run>,
<aospy.Calc instance: precip_convective, example_proj, example_model, example_run>,
<aospy.Calc instance: precip_total, example_proj, example_model, example_run>,
<aospy.Calc instance: precip_conv_frac, example_proj, example_model, example_run>]
Each aospy.Calc
object includes the paths to the output
In [32]: calcs[0].path_out
Out[32]:
{'av': 'example-output/example_proj/example_model/example_run/precip_largescale/precip_largescale.ann.av.from_monthly_ts.example_model.example_run.0004-0006.nc',
'reg.av': 'example-output/example_proj/example_model/example_run/precip_largescale/precip_largescale.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc'}
and the results of each output type
Note
You may have noticed that subset_...
and raw_...
coordinates have years 1678 and later, when our data was from model
years 4 through 6. This is because technical details upstream limit the range of supported whole years to 1678-2262.
As a workaround, aospy pretends that any timeseries that starts before the beginning of this range actually starts at 1678. An upstream fix is currently under way, at which point all dates will be supported without this workaround.
Gridpoint-by-gridpoint¶
Let’s plot (using matplotlib) the time
average at each gridcell of all four variables. For demonstration
purposes, we’ll load the data that was saved to disk using xarray
rather than getting it directly from the data_out
attribute as
above.
In [33]: from matplotlib import pyplot as plt
In [34]: fig = plt.figure()
In [35]: for i, calc in enumerate(calcs):
....: ax = fig.add_subplot(2, 2, i+1)
....: arr = xr.open_dataset(calc.path_out['av']).to_array()
....: if calc.name != precip_conv_frac.name:
....: arr *= 86400 # convert to units mm per day
....: arr.plot(ax=ax)
....: ax.set_title(calc.name)
....: ax.set_xticks(range(0, 361, 60))
....: ax.set_yticks(range(-90, 91, 30))
....:
In [36]: plt.tight_layout()
In [37]: plt.show()
We see that precipitation maximizes at the equator and has a secondary maximum in the mid-latitudes. [5] Also, the convective precipitation dominates the total in the Tropics, but moving poleward the gridscale condensation plays an increasingly larger fractional role (note different colorscales in each panel). [6]
Regional averages¶
Now let’s examine the regional averages. We find that the global annual mean total precipitation rate for this run (converting to units of mm per day) is:
In [38]: for calc in calcs:
....: ds = xr.open_dataset(calc.path_out['reg.av'])
....: if calc.name != precip_conv_frac.name:
....: ds *= 86400 # convert to units mm/day
....: print(calc.name, ds, '\n')
....:
precip_largescale <xarray.Dataset>
Dimensions: ()
Coordinates:
subset_start_date datetime64[ns] 1678-01-01
subset_end_date datetime64[ns] 1680-12-31
raw_data_end_date datetime64[ns] 1681-01-01
raw_data_start_date datetime64[ns] 1678-01-01
Data variables:
globe float64 0.9149
tropics float64 0.4551
precip_convective <xarray.Dataset>
Dimensions: ()
Coordinates:
subset_start_date datetime64[ns] 1678-01-01
subset_end_date datetime64[ns] 1680-12-31
raw_data_end_date datetime64[ns] 1681-01-01
raw_data_start_date datetime64[ns] 1678-01-01
Data variables:
globe float64 2.11
tropics float64 3.055
precip_total <xarray.Dataset>
Dimensions: ()
Coordinates:
subset_start_date datetime64[ns] 1678-01-01
subset_end_date datetime64[ns] 1680-12-31
raw_data_end_date datetime64[ns] 1681-01-01
raw_data_start_date datetime64[ns] 1678-01-01
Data variables:
globe float64 3.025
tropics float64 3.51
precip_conv_frac <xarray.Dataset>
Dimensions: ()
Coordinates:
subset_end_date datetime64[ns] 1680-12-31
subset_start_date datetime64[ns] 1678-01-01
raw_data_end_date datetime64[ns] 1681-01-01
raw_data_start_date datetime64[ns] 1678-01-01
Data variables:
globe float64 0.598
tropics float64 0.808
As was evident from the plots, we see that most precipitation (80.8%) in the tropics comes from convective rainfall, but averaged over the globe the large-scale condensation is a more equal player (40.2% for large-scale, 59.8% for convective).
Beyond this simple example¶
Scaling up¶
In this case, we computed time averages of four variables, both at each gridpoint (which we’ll call 1 calculation) and averaged over two regions, yielding (4 variables)*(1 gridcell operation + (2 regions)*(1 regional operation)) = 12 total calculations executed. Not bad, but 12 calculations is few enough that we probably could have handled them without aospy.
The power of aospy is that, with the infrastructure we’ve put in place, we can now fire off additional calculations at any time. Some examples:
- Set
output_time_regional_reductions=['ts', 'std', 'reg.ts', 'reg.std']
: calculate the timeseries (‘ts’) and standard deviation (‘std’) of annual mean values at each gridpoint and for the regional averages. - Set
output_time_intervals=range(1, 13)
: average across years for each January (1), each February (2), etc. through December (12). [7]
With these settings, the number of calculations is now (4 variables)*(2 gridcell operations + (2 regions)*(2 regional operations))*(12 temporal averages) = 288 calculations submitted with a single command.
Modifying your object library¶
We can also add new objects to our object library at any time. For
example, suppose we performed a new simulation in which we modified
the formulation of the convective parameterization. All we would have
to do is create a corresponding aospy.Run
object, and then
we can execute calculations for that simulation. And likewise for
models, projects, variables, and regions.
As a real-world example, two of aospy’s developers use aospy for in their own scientific research, with multiple projects each comprising multiple models, simulations, etc. They routinely fire off thousands of calculations at once. And thanks to the highly organized and metadata-rich directory structure and filenames of the aospy output netCDF files, all of the resulting data is easy to find and use.
Example “main” script¶
Finally, aospy comes included with a “main” script for submitting calculations that is pre-populated with the objects from the example object library. It also comes with in-line instructions on how to use it, whether you want to keep playing with the example library or modify it to use on your own object library.
It is located in “examples” directory of your aospy installation.
Find it via typing python -c "import os, aospy;
print(os.path.join(aospy.__path__[0], 'examples', 'aospy_main.py'))"
from your terminal.
Footnotes
[1] | An “idealized climate model” is a model that, for the sake of computational efficiency and conceptual simplicity, omits and/or simplifies various processes relative to how they are computed in full, production-class models. The particular model used here is described in Frierson et al 2006. |
[2] | An “aquaplanet” is simply a climate model in which the the surface is entirely ocean, i.e. there is no land. Interactions between atmospheric and land processes are complicated, and so an aquaplanet avoids those complications while still generating a climate (when zonally averaged, i.e. averaged around each latitude circle) that roughly resembles that of the real Earth’s. |
[3] | Most climate models generate precipitation through two separate pathways: (1) direct saturation of a whole gridbox, which results in condensation and precipitation, and (2) a “convective parameterization.” The latter simulates the precipitation that, due to subgrid-scale variability, can be expected to occur at some fraction of the area within a gridcell, even though the cell as a whole isn’t saturated. The total precipitation is simply the sum of these “large-scale” and “convective” components. |
[4] | In this case, the model being used is an aquaplanet, so the mask will be simply all ocean. But this is not generally the case – comprehensive climate and weather models include Earth’s full continental geometry and land topography (at least as well as can be resolved at their particular horizontal grid resolution). |
[5] | This equatorial rainband is called the Intertropical Convergence Zone, or ITCZ. In this simulation, the imposed solar radiation is fixed at Earth’s annual mean value, which is symmetric about the equator. The ITCZ typically follows the solar radiation maximum, hence its position in this case directly on the equator. |
[6] | This is a very common result. The gridcells of many climate models are several hundred kilometers by several hundred kilometers in area. In Earth’s Tropics, most rainfall is generated by cumulus towers that are much smaller than this. But in the mid-latitudes, a phenomenon known as baroclinic instability generates much larger eddies that can span several hundred kilometers. |
[7] | In this particular simulation, the boundary conditions are constant in time, so there is no seasonal cycle. But we could use these monthly averages to confirm that’s actually the case, i.e. that we didn’t accidentally use time-varying solar radiation when we ran the model. |
Installation¶
Supported platforms¶
- Operating system: Windows, MacOS/Mac OS X, and Linux
- Python: 2.7, 3.4, 3.5, and 3.6
Note
We highly recommend installing and using the free Anaconda (or Miniconda) distribution of Python, which works on Mac, Linux, and Windows, both on normal computers and institutional clusters and doesn’t require root permissions.
See e.g. this blog post for instructions on how to install Miniconda on a large cluster without root permission.
Recommended installation method: conda¶
The recommended installation method is via conda
conda install -c conda-forge aospy
Note
Python 3.4 users: conda-forge no longer supports builds of packages on Python 3.4. Please use one of the alternative installation methods described below.
Alternative method #1: pip¶
aospy is available from the official Python Packaging Index (PyPI) via pip
:
pip install aospy
Alternative method #2: clone from Github¶
You can also directly clone the Github repo
git clone https://www.github.com/spencerahill/aospy.git
cd aospy
python setup.py install
Verifying proper installation¶
Once installed via any of these methods, you can run aospy’s suite of tests using py.test. From the top-level directory of the aospy installation
conda install pytest # if you don't have it already; or 'pip install pytest'
py.test aospy
If you don’t know the directory where aospy was installed, you can find it via
python -c "import aospy; print(aospy.__path__[0])"
If the pytest command results in any error messages or test failures, something has gone wrong, and please refer to the Troubleshooting information below.
Troubleshooting¶
Please search through our Issues page on Github and our mailing list to see if anybody else has had the same problem you’re facing. If none do, then please send a message on the list or open a new Issue.
Please don’t hesitate, especially if you’re new to the package and/or Python! We are eager to help people get started. Even you suspect it’s something simple or obvious – that just means we didn’t make it clear/easy enough!
API Reference¶
Here we provide the reference documentation for aospy’s public API. If you are new to the package and/or just trying to get a feel for the overall workflow, you are better off starting in the Overview, Using aospy, or Examples sections of this documentation.
Warning
aospy is under active development. While we strive to maintain backwards compatibility, it is likely that some breaking changes to the codebase will occur in the future as aospy is improved.
Core Hierarchy for Input Data¶
aospy provides three classes for specifying the location and
characteristics of data saved on disk as netCDF files that the user
wishes to use as input data for aospy calculations: Proj
,
Model
, and Run
.
Proj¶
-
class
aospy.proj.
Proj
(name, description=None, models=None, default_models=None, regions=None, direc_out='', tar_direc_out='')[source]¶ An object that describes a single project that will use aospy.
This is the top-level class in the aospy hierarchy of data representations. It is meant to contain all of the Model, Run, and Region objects that are of relevance to a particular research project. (Any of these may be used by multiple Proj objects.)
The Proj class itself provides little functionality, but it is an important means of organizing a user’s work across different projects. In particular, the output of all calculations using aospy.Calc are saved in a directory structure whose root is that of the Proj object specified for that calculation.
Attributes
name (str) The run’s name description (str) A description of the run direc_out, tar_direc_out (str) The paths to the root directories of, respectively, the standard and .tar versions of the output of aospy calculations saved to disk. models (dict) A dictionary with entries of the form {model_obj.name: model_obj}
, for each of thisProj
’s child model objectsdefault_models (dict) The default model objects on which to perform calculations via aospy.Calc if not otherwise specified regions (dict) A dictionary with entries of the form {regin_obj.name: region_obj}
, for each of thisProj
’s child region objects-
__init__
(name, description=None, models=None, default_models=None, regions=None, direc_out='', tar_direc_out='')[source]¶ Parameters: name : str
The project’s name. This should be unique from that of any other Proj objects being used.
description : str, optional
A description of the model. This is not used internally by aospy; it is solely for the user’s information.
regions : {None, sequence of aospy.Region objects}, optional
The desired regions over which to perform region-average calculations.
models : {None, sequence of aospy.Model objects}, optional
The child Model objects of this project.
default_models : {None, sequence of aospy.Run objects}, optional
The subset of this Model’s runs over which to perform calculations by default.
direc_out, tar_direc_out : str
Path to the root directories of where, respectively, regular output and a .tar-version of the output will be saved to disk.
See also
aospy.Model
,aospy.Region
,aospy.Run
-
Model¶
-
class
aospy.model.
Model
(name=None, description=None, proj=None, grid_file_paths=None, default_start_date=None, default_end_date=None, runs=None, default_runs=None, load_grid_data=False)[source]¶ An object that describes a single climate or weather model.
Each Model object is associated with a parent Proj object and also with one or more child Run objects.
If aospy is being used to work with non climate- or weather-model data, the Model object can be used e.g. to represent a gridded observational product, with its child Run objects representing different released versions of that dataset.
Attributes
name (str) The model’s name description (str) A description of the model proj ({None, aospy.Proj}) The model’s parent aospy.Proj object runs (list) A list of this model’s child Run objects default_runs (list) The default subset of child run objects on which to perform calculations via aospy.Calc with this model if not otherwise specified grid_file_paths (list) The paths to netCDF files stored on disk from which the model’s coordinate data can be taken. default_start_date, default_end_date (datetime.datetime) The default start and end dates of any calculations using this Model -
__init__
(name=None, description=None, proj=None, grid_file_paths=None, default_start_date=None, default_end_date=None, runs=None, default_runs=None, load_grid_data=False)[source]¶ Parameters: name : str
The model’s name. This must be unique from that of any other Model objects being used by the parent Proj.
description : str, optional
A description of the model. This is not used internally by aospy; it is solely for the user’s information.
proj : {None, aospy.Proj}, optional
The parent Proj object. When the parent Proj object is instantiated with this Model included in its models attribute, this will be over-written with that Proj object.
grid_file_paths : {None, sequence of strings}, optional
The paths to netCDF files stored on disk from which the model’s coordinate data can be taken.
default_start_date, default_end_date : {None, datetime.datetime}, optional
Default start and end dates of calculations to be performed using this Model.
runs : {None, sequence of aospy.Run objects}, optional
The child run objects of this Model
default_runs : {None, sequence of aospy.Run objects}, optional
The subset of this Model’s runs over which to perform calculations by default.
load_grid_data : bool, optional (default False)
Whether or not to load the grid data specified by ‘grid_file_paths’ upon initilization
See also
aospy.DataLoader
,aospy.Proj
,aospy.Run
-
Run¶
-
class
aospy.run.
Run
(name=None, description=None, proj=None, default_start_date=None, default_end_date=None, data_loader=None)[source]¶ An object that describes a single model ‘run’ (i.e. simulation).
Each Run object is associated with a parent Model object. This parent attribute is not set by Run itself, however; it is set during the instantation of the parent Model object.
If aospy is being used to work with non climate-model data, the Run object can be used e.g. to represent different versions of a gridded observational data product, with the parent Model representing that data product more generally.
Attributes
name (str) The run’s name description (str) A description of the run proj ({None, aospy.Proj}) The run’s parent aospy.Proj object default_start_date, default_end_date (datetime.datetime) The default start and end dates of any calculations using this Run data_loader (aospy.DataLoader) The aospy.DataLoader object used to find data on disk corresponding to this Run object -
__init__
(name=None, description=None, proj=None, default_start_date=None, default_end_date=None, data_loader=None)[source]¶ Instantiate a Run object.
Parameters: name : str
The run’s name. This must be unique from that of any other Run objects being used by the parent Model.
description : str, optional
A description of the model. This is not used internally by aospy; it is solely for the user’s information.
proj : {None, aospy.Proj}, optional
The parent Proj object.
data_loader : aospy.DataLoader
The DataLoader object used to find the data on disk to be used as inputs for aospy calculations for this run.
default_start_date, default_end_date : datetime.datetime, optional
Default start and end dates of calculations to be performed using this Model.
See also
aospy.DataLoader
,aospy.Model
-
DataLoaders¶
Run
objects rely on a helper “data loader” to specify how to find
their underlying data that is saved on disk. This mapping of
variables, time ranges, and potentially other parameters to the
location of the corresponding data on disk can differ among modeling
centers or even between different models at the same center.
Currently supported data loader types are DictDataLoader
,
NestedDictDataLoader
, and GFDLDataLoader
Each of these inherit
from the abstract base DataLoader
class.
-
class
aospy.data_loader.
DataLoader
[source]¶ A fundamental DataLoader object
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
load_variable
(var=None, start_date=None, end_date=None, time_offset=None, **DataAttrs)[source]¶ Load a DataArray for requested variable and time range.
Automatically renames all grid attributes to match aospy conventions.
Parameters: var : Var
aospy Var object
start_date : datetime.datetime
start date for interval
end_date : datetime.datetime
end date for interval
time_offset : dict
Option to add a time offset to the time coordinate to correct for incorrect metadata.
**DataAttrs
Attributes needed to identify a unique set of files to load from
Returns: da : DataArray
DataArray for the specified variable, date range, and interval in
-
-
class
aospy.data_loader.
DictDataLoader
(file_map=None, preprocess_func=<function DictDataLoader.<lambda>>)[source]¶ A DataLoader that uses a dict mapping lists of files to string tags
This is the simplest DataLoader; it is useful for instance if one is dealing with raw model history files, which tend to group all variables of a single output interval into single filesets. The intvl_in parameter is a string description of the time frequency of the data one is referencing (e.g. ‘monthly’, ‘daily’, ‘3-hourly’). In principle, one can give it any string value.
Parameters: file_map : dict
A dict mapping an input interval to a list of files
preprocess_func : function (optional)
A function to apply to every Dataset before processing in aospy. Must take a Dataset and
**kwargs
as its two arguments.Examples
Case of two sets of files, one with monthly average output, and one with 3-hourly output.
>>> file_map = {'monthly': '000[4-6]0101.atmos_month.nc', ... '3hr': '000[4-6]0101.atmos_8xday.nc'} >>> data_loader = DictDataLoader(file_map)
If one wanted to correct a CF-incompliant units attribute on each Dataset read in, which depended on the
intvl_in
of the fileset one could define apreprocess_func
which took into account theintvl_in
keyword argument.>>> def preprocess(ds, **kwargs): ... if kwargs['intvl_in'] == 'monthly': ... ds['time'].attrs['units'] = 'days since 0001-01-0000' ... if kwargs['intvl_in'] == '3hr': ... ds['time'].attrs['units'] = 'hours since 0001-01-0000' ... return ds >>> data_loader = DictDataLoader(file_map, preprocess)
-
class
aospy.data_loader.
NestedDictDataLoader
(file_map=None, preprocess_func=<function NestedDictDataLoader.<lambda>>)[source]¶ DataLoader that uses a nested dictionary mapping to load files
This is the most flexible existing type of DataLoader; it allows for the specification of different sets of files for different variables. The intvl_in parameter is a string description of the time frequency of the data one is referencing (e.g. ‘monthly’, ‘daily’, ‘3-hourly’). In principle, one can give it any string value. The variable name can be any variable name in your aospy object library (including alternative names).
Parameters: file_map : dict
A dict mapping intvl_in to dictionaries mapping Var objects to lists of files
preprocess_func : function (optional)
A function to apply to every Dataset before processing in aospy. Must take a Dataset and
**kwargs
as its two arguments.Examples
Case of a set of monthly average files for large scale precipitation, and another monthly average set of files for convective precipitation.
>>> file_map = {'monthly': {'precl': '000[4-6]0101.precl.nc', ... 'precc': '000[4-6]0101.precc.nc'}} >>> data_loader = NestedDictDataLoader(file_map)
See
aospy.data_loader.DictDataLoader
for an example of a possible function to pass as apreprocess_func
.
-
class
aospy.data_loader.
GFDLDataLoader
(template=None, data_direc=None, data_dur=None, data_start_date=None, data_end_date=None, preprocess_func=<function GFDLDataLoader.<lambda>>)[source]¶ DataLoader for NOAA GFDL model output
This is an example of a domain-specific custom DataLoader, designed specifically for finding files output by the Geophysical Fluid Dynamics Laboratory’s model history file post-processing tools.
Parameters: template : GFDLDataLoader
Optional argument to specify a base GFDLDataLoader to inherit parameters from
data_direc : str
Root directory of data files
data_dur : int
Number of years included per post-processed file
data_start_date : datetime.datetime
Start date of data files
data_end_date : datetime.datetime
End date of data files
preprocess_func : function (optional)
A function to apply to every Dataset before processing in aospy. Must take a Dataset and
**kwargs
as its two arguments.Examples
Case without a template to start from.
>>> base = GFDLDataLoader(data_direc='/archive/control/pp', data_dur=5, ... data_start_date=datetime(2000, 1, 1), ... data_end_date=datetime(2010, 12, 31))
Case with a starting template.
>>> data_loader = GFDLDataLoader(base, data_direc='/archive/2xCO2/pp')
See
aospy.data_loader.DictDataLoader
for an example of a possible function to pass as apreprocess_func
.
Variables and Regions¶
The Var
and Region
classes are used to represent,
respectively, physical quantities the user wishes to be able to
compute and geographical regions over which the user wishes to
aggregate their calculations.
Whereas the Proj
- Model
- Run
hierarchy is used to
describe the data resulting from particular model simulations, Var
and Region
represent the properties of generic physical entities
that do not depend on the underlying data.
Var¶
-
class
aospy.var.
Var
(name, alt_names=None, func=None, variables=None, func_input_dtype='DataArray', units='', plot_units='', plot_units_conv=1, domain='atmos', description='', def_time=False, def_vert=False, def_lat=False, def_lon=False, math_str=False, colormap='RdBu_r', valid_range=None)[source]¶ An object representing a physical quantity to be computed.
Attributes
name (str) The variable’s name alt_names (tuple of strings) All other names that the variable may be referred to in the input data names (tuple of strings) The combination of name and alt_names description (str) A description of the variable func (function) The function with which to compute the variable variables (sequence of aospy.Var objects) The variables passed to func to compute it func_input_dtype ({‘DataArray’, ‘Dataset’, ‘numpy’}) The datatype expected by func of its arguments units (aospy.units.Units object) The variable’s physical units domain (str) The physical domain of the variable, e.g. ‘atmos’, ‘ocean’, or ‘land’ def_time, def_vert, def_lat, def_lon (bool) Whether the variable is defined, respectively, in time, vertically, in latitude, and in longitude math_str (str) The mathematical representation of the variable colormap (str) The name of the default colormap to be used in plots of this variable valid_range (length-2 tuple) The range of values outside which to flag as unphysical/erroneous -
__init__
(name, alt_names=None, func=None, variables=None, func_input_dtype='DataArray', units='', plot_units='', plot_units_conv=1, domain='atmos', description='', def_time=False, def_vert=False, def_lat=False, def_lon=False, math_str=False, colormap='RdBu_r', valid_range=None)[source]¶ Instantiate a Var object.
Parameters: name : str
The variable’s name
alt_names : tuple of strings
All other names that the variable might be referred to in any input data. Each of these should be unique to this variable in order to avoid loading the wrong quantity.
description : str
A description of the variable
func : function
The function with which to compute the variable
variables : sequence of aospy.Var objects
The variables passed to func to compute it. Order matters: whenever calculations are performed to generate data corresponding to this Var, the data corresponding to the elements of variables will be passed to self.function in the same order.
func_input_dtype : {None, ‘DataArray’, ‘Dataset’, ‘numpy’}
The datatype expected by func of its arguments
units : aospy.units.Units object
The variable’s physical units
domain : str
The physical domain of the variable, e.g. ‘atmos’, ‘ocean’, or ‘land’. This is only used by aospy by some types of DataLoader, including GFDLDataLoader.
def_time, def_vert, def_lat, def_lon : bool
Whether the variable is defined, respectively, in time, vertically, in latitude, and in longitude
math_str : str
The mathematical representation of the variable. This is typically a raw string of LaTeX math-mode, e.g. r’$T_mathrm{sfc}$’ for surface temperature.
colormap : str
(Currently not used by aospy) The name of the default colormap to be used in plots of this variable.
valid_range : length-2 tuple
The range of values outside which to flag as unphysical/erroneous
-
Region¶
-
class
aospy.region.
Region
(name='', description='', lon_bounds=[], lat_bounds=[], mask_bounds=[], do_land_mask=False)[source]¶ Geographical region over which to perform averages and other reductions.
Each Proj object includes a list of Region objects, which is used by Calc to determine which regions over which to perform time reductions over region-average quantities.
Region boundaries are specified as either a single “rectangle” in latitude and longitude or the union of multiple such rectangles. In addition, a land or ocean mask can be applied.
See also
aospy.Calc.region_calcs
Attributes
name (str) The region’s name description (str) A description of the region mask_bounds (tuple) The coordinates definining the lat-lon rectangle(s) that define(s) the region’s boundaries do_land_mask Whether values occurring over land, ocean, or neither are excluded from the region, and whether the included points must be strictly land or ocean or if fractional land/ocean values are included. -
__init__
(name='', description='', lon_bounds=[], lat_bounds=[], mask_bounds=[], do_land_mask=False)[source]¶ Instantiate a Region object.
If a region spans across the endpoint of the data’s longitude array (i.e. it crosses the Prime Meridian for data with longitudes spanning 0 to 360), it must be defined as the union of two sections extending to the east and to the west of the Prime Meridian.
Parameters: name : str
The region’s name. This must be unique from that of any other Region objects being used by the overlying Proj.
description : str, optional
A description of the region. This is not used internally by aospy; it is solely for the user’s information.
lon_bounds, lat_bounds : length-2 sequence, optional
The longitude and latitude bounds of the region. If the region boundaries are more complicated than a single lat-lon rectangle, use mask_bounds instead.
mask_bounds : sequence, optional
Each element is a length-2 tuple of the format (lat_bounds, lon_bounds), where each of lat_bounds and lon_bounds are of the form described above.
do_land_mask : { False, True, ‘ocean’, ‘strict_land’, ‘strict_ocean’},
optional
Determines what, if any, land mask is applied in addition to the mask defining the region’s boundaries. Default False.
- True: apply the data’s full land mask
- False: apply no mask
- ‘ocean’: mask out land rather than ocean
- ‘strict_land’: mask out all points that are not 100% land
- ‘strict_ocean’: mask out all points that are not 100% ocean
Examples
Define a region spanning the entire globe
>>> globe = Region(name='globe', lat_bounds=(-90, 90), ... lon_bounds=(0, 360), do_land_mask=False)
Define a region corresponding to the Sahel region of Africa, which we’ll define as land points within 10N-20N latitude and 18W-40E longitude. Because this region crosses the 0 degrees longitude point, it has to be defined using mask_bounds as the union of two lat-lon rectangles.
>>> sahel = Region(name='sahel', do_land_mask=True, ... mask_bounds=[((10, 20), (0, 40)), ... ((10, 20), (342, 360))])
-
Calculations¶
Calc
is the engine that combines the user’s specifications of (1)
the data on disk via Proj
, Model
, and Run
, (2) the
physical quantity to compute and regions to aggregate over via Var
and Region
, and (3) the desired date range, time reduction method,
and other characteristics to actually perform the calculation
Whereas Proj
, Model
, Run
, Var
, and Region
are all
intended to be saved in .py
files for reuse, Calc
objects are
intended to be generated dynamically by a main script and then not
retained after they have written their outputs to disk following the
user’s specifications.
Moreover, if the main.py
script is used to execute calculations,
no direct interfacing with Calc
or it’s helper class,
CalcInterface
is required by the user, in which case this section
should be skipped entirely.
Also included is the automate
module, which enables aospy e.g. in
the main script to find objects in the user’s object library that the
user specifies via their string names rather than having to import the
objects themselves.
CalcInterface and Calc¶
-
class
aospy.calc.
CalcInterface
(proj=None, model=None, run=None, ens_mem=None, var=None, date_range=None, region=None, intvl_in=None, intvl_out=None, dtype_in_time=None, dtype_in_vert=None, dtype_out_time=None, dtype_out_vert=None, level=None, time_offset=None)[source]¶ Interface to the Calc class.
-
__init__
(proj=None, model=None, run=None, ens_mem=None, var=None, date_range=None, region=None, intvl_in=None, intvl_out=None, dtype_in_time=None, dtype_in_vert=None, dtype_out_time=None, dtype_out_vert=None, level=None, time_offset=None)[source]¶ Instantiate a CalcInterface object.
Parameters: proj : aospy.Proj object
The project for this calculation.
model : aospy.Model object
The model for this calculation.
run : aospy.Run object
The run for this calculation.
var : aospy.Var object
The variable for this calculation.
ens_mem : Currently not supported.
This will eventually be used to specify particular ensemble members of multi-member ensemble simulations.
region : sequence of aospy.Region objects
The region(s) over which any regional reductions will be performed.
date_range : tuple of datetime.datetime objects
The range of dates over which to perform calculations.
intvl_in : {None, ‘annual’, ‘monthly’, ‘daily’, ‘6hr’, ‘3hr’}, optional
The time resolution of the input data.
dtype_in_time : {None, ‘inst’, ‘ts’, ‘av’, ‘av_ts’}, optional
What the time axis of the input data represents:
- ‘inst’ : Timeseries of instantaneous values
- ‘ts’ : Timeseries of averages over the period of each time-index
- ‘av’ : A single value averaged over a date range
dtype_in_vert : {None, ‘pressure’, ‘sigma’}, optional
The vertical coordinate system used by the input data:
- None : not defined vertically
- ‘pressure’ : pressure coordinates
- ‘sigma’ : hybrid sigma-pressure coordinates
intvl_out : {‘ann’, season-string, month-integer}
The sub-annual time interval over which to compute:
- ‘ann’ : Annual mean
- season-string : E.g. ‘JJA’ for June-July-August
- month-integer : 1 for January, 2 for February, etc.
dtype_out_time : tuple with elements being one or more of:
- Gridpoint-by-gridpoint output:
- ‘av’ : Gridpoint-by-gridpoint time-average
- ‘std’ : Gridpoint-by-gridpoint temporal standard deviation
- ‘ts’ : Gridpoint-by-gridpoint time-series
- Averages over each region specified via region:
- ‘reg.av’, ‘reg.std’, ‘reg.ts’ : analogous to ‘av’, ‘std’, ‘ts’
dtype_out_vert : {None, ‘vert_av’, ‘vert_int’}, optional
How to reduce the data vertically:
- None : no vertical reduction (i.e. output is defined vertically)
- ‘vert_av’ : mass-weighted vertical average
- ‘vert_int’ : mass-weighted vertical integral
time_offset : {None, dict}, optional
How to offset input data in time to correct for metadata errors
- None : no time offset applied
- dict : e.g.
{'hours': -3}
to offset times by -3 hours Seeaospy.utils.times.apply_time_offset()
.
-
-
class
aospy.calc.
Calc
(calc_interface)[source]¶ Class for executing, saving, and loading a single computation.
Calc objects are instantiated with a single argument: a CalcInterface object that includes all of the parameters necessary to determine what calculations to perform.
-
ARR_XARRAY_NAME
= 'aospy_result'¶
-
compute
(write_to_tar=True)[source]¶ Perform all desired calculations on the data and save externally.
-
automate¶
Functionality for specifying and cycling through multiple calculations.
-
class
aospy.automate.
CalcSuite
(calc_suite_specs)[source]¶ Suite of Calc objects generated from provided specifications.
-
aospy.automate.
submit_mult_calcs
(calc_suite_specs, exec_options=None)[source]¶ Generate and execute all specified computations.
Once the calculations are prepped and submitted for execution, any calculation that triggers any exception or error is skipped, and the rest of the calculations proceed unaffected. This prevents an error in a single calculation from crashing a large suite of calculations.
Parameters: calc_suite_specs : dict
The specifications describing the full set of calculations to be generated and potentially executed. Accepted keys and their values:
- library : module or package comprising an aospy object library
The aospy object library for these calculations.
- projects : list of aospy.Proj objects
The projects to permute over.
- models : ‘all’, ‘default’, or list of aospy.Model objects
The models to permute over. If ‘all’, use all models in the
models
attribute of eachProj
. If ‘default’, use all models in thedefault_models
attribute of eachProj
.- runs : ‘all’, ‘default’, or list of aospy.Run objects
The runs to permute over. If ‘all’, use all runs in the
runs
attribute of eachModel
. If ‘default’, use all runs in thedefault_runs
attribute of eachModel
.- variables : list of aospy.Var objects
The variables to be calculated.
- regions : ‘all’ or list of aospy.Region objects
The region(s) over which any regional reductions will be performed. If ‘all’, use all regions in the
regions
attribute of eachProj
.- date_ranges : ‘default’ or tuple of datetime.datetime objects
The range of dates (inclusive) over which to perform calculations. If ‘default’, use the
default_start_date
anddefault_end_date
attribute of eachRun
.- output_time_intervals : {‘ann’, season-string, month-integer}
The sub-annual time interval over which to aggregate.
- ‘ann’ : Annual mean
- season-string : E.g. ‘JJA’ for June-July-August
- month-integer : 1 for January, 2 for February, etc. Each one is
- a separate reduction, e.g. [1, 2] would produce averages (or other specified time reduction) over all Januaries, and separately over all Februaries.
- output_time_regional_reductions : list of reduction string identifiers
Unlike most other keys, these are not permuted over when creating the
aospy.Calc
objects that execute the calculations; eachaospy.Calc
performs all of the specified reductions. Accepted string identifiers are:- Gridpoint-by-gridpoint output:
- ‘av’ : Gridpoint-by-gridpoint time-average
- ‘std’ : Gridpoint-by-gridpoint temporal standard deviation
- ‘ts’ : Gridpoint-by-gridpoint time-series
- Averages over each region specified via region:
- ‘reg.av’, ‘reg.std’, ‘reg.ts’ : analogous to ‘av’, ‘std’, ‘ts’
- Gridpoint-by-gridpoint output:
- output_vertical_reductions : {None, ‘vert_av’, ‘vert_int’}, optional
How to reduce the data vertically:
- None : no vertical reduction
- ‘vert_av’ : mass-weighted vertical average
- ‘vert_int’ : mass-weighted vertical integral
- input_time_intervals : {‘annual’, ‘monthly’, ‘daily’, ‘#hr’}
A string specifying the time resolution of the input data. In ‘#hr’ above, the ‘#’ stands for a number, e.g. 3hr or 6hr, for sub-daily output. These are the suggested specifiers, but others may be used if they are also used by the DataLoaders for the given Runs.
- input_time_datatypes : {‘inst’, ‘ts’, ‘av’}
What the time axis of the input data represents:
- ‘inst’ : Timeseries of instantaneous values
- ‘ts’ : Timeseries of averages over the period of each time-index
- ‘av’ : A single value averaged over a date range
- input_vertical_datatypes : {False, ‘pressure’, ‘sigma’}, optional
The vertical coordinate system used by the input data:
- False : not defined vertically
- ‘pressure’ : pressure coordinates
- ‘sigma’ : hybrid sigma-pressure coordinates
- input_time_offsets : {None, dict}, optional
How to offset input data in time to correct for metadata errors
- None : no time offset applied
- dict : e.g.
{'hours': -3}
to offset times by -3 hours Seeaospy.utils.times.apply_time_offset()
.
exec_options : dict or None (default None)
Options regarding how the calculations are reported, submitted, and saved. If None, default settings are used for all options. Currently supported options (each should be either True or False):
- prompt_verify : (default False) If True, print summary of
- calculations to be performed and prompt user to confirm before submitting for execution.
- parallelize : (default False) If True, submit calculations in
- parallel.
- client : distributed.Client or None (default None) The
- dask.distributed Client used to schedule computations. If None and parallelize is True, a LocalCluster will be started.
- write_to_tar : (default True) If True, write results of calculations
- to .tar files, one for each
aospy.Run
object. These tar files have an identical directory structures the standard output relative to their root directory, which is specified via the tar_direc_out argument of each Proj object’s instantiation.
Returns: A list of the return values from each
aospy.Calc.compute()
callIf a calculation ran without error, this value is the
aospy.Calc
object itself, with the results of its calculations saved in itsdata_out
attribute.data_out
is a dictionary, with the keys being the temporal-regional reduction identifiers (e.g. ‘reg.av’), and the values being the corresponding result.If any error occurred during a calculation, the return value is None.
Raises: AospyException
If the
prompt_verify
option is set to True and the user does not respond affirmatively to the prompt.
Units and Constants¶
aospy provides the classes Constant
and Units
for representing, respectively, physical constants (e.g. Earth’s
gravitational acceleration at the surface = 9.81 m/s^2) and physical
units (e.g. meters per second squared in that example).
aospy comes with several commonly used constants saved within the
constants
module in which the Constant
class
is also defined. In contrast, there are no pre-defined
Units
objects; the user must define any Units
objects they wish to use (e.g. to populate the units
attribute of their Var
objects).
Warning
Whereas these baked-in Constant
objects are used by
aospy in various places, aospy currently does not actually use the
Var.units
attribute during calculations or the
Units
class more generally; they are solely for the
user’s own informational purposes.
constants¶
Classes and objects pertaining to physical constants.
units¶
Functionality for representing physical units, e.g. meters.
Utilities¶
aospy includes a number of utility functions that are used internally and may also be useful to users for their own purposes. These include functions pertaining to input/output (IO), time arrays, andvertical coordinates.
utils.io¶
Utility functions for data input and output.
-
aospy.utils.io.
data_in_label
(intvl_in, dtype_in_time, dtype_in_vert=False)[source]¶ Create string label specifying the input data of a calculation.
-
aospy.utils.io.
data_name_gfdl
(name, domain, data_type, intvl_type, data_yr, intvl, data_in_start_yr, data_in_dur)[source]¶ Determine the filename of GFDL model data output.
-
aospy.utils.io.
get_parent_attr
(obj, attr, strict=False)[source]¶ Search recursively through an object and its parent for an attribute.
Check if the object has the given attribute and it is non-empty. If not, check each parent object for the attribute and use the first one found.
utils.times¶
Utility functions for handling times, dates, etc.
-
aospy.utils.times.
add_uniform_time_weights
(ds)[source]¶ Append uniform time weights to a Dataset.
All DataArrays with a time coordinate require a time weights coordinate. For Datasets read in without a time bounds coordinate or explicit time weights built in, aospy adds uniform time weights at each point in the time coordinate.
Parameters: ds : Dataset
Input data
Returns: Dataset
-
aospy.utils.times.
apply_time_offset
(time, years=0, months=0, days=0, hours=0)[source]¶ Apply a specified offset to the given time array.
This is useful for GFDL model output of instantaneous values. For example, 3 hourly data postprocessed to netCDF files spanning 1 year each will actually have time values that are offset by 3 hours, such that the first value is for 1 Jan 03:00 and the last value is 1 Jan 00:00 of the subsequent year. This causes problems in xarray, e.g. when trying to group by month. It is resolved by manually subtracting off those three hours, such that the dates span from 1 Jan 00:00 to 31 Dec 21:00 as desired.
Parameters: time : xarray.DataArray representing a timeseries
years, months, days, hours : int, optional
The number of years, months, days, and hours, respectively, to offset the time array by. Positive values move the times later.
Returns: pandas.DatetimeIndex
Examples
Case of a length-1 input time array:
>>> times = xr.DataArray(datetime.datetime(1899, 12, 31, 21)) >>> apply_time_offset(times) Timestamp('1900-01-01 00:00:00')
Case of input time array with length greater than one:
>>> times = xr.DataArray([datetime.datetime(1899, 12, 31, 21), ... datetime.datetime(1899, 1, 31, 21)]) >>> apply_time_offset(times) DatetimeIndex(['1900-01-01', '1899-02-01'], dtype='datetime64[ns]', freq=None)
-
aospy.utils.times.
assert_matching_time_coord
(arr1, arr2)[source]¶ Check to see if two DataArrays have the same time coordinate.
Parameters: arr1 : DataArray or Dataset
First DataArray or Dataset
arr2 : DataArray or Dataset
Second DataArray or Dataset
Raises: ValueError
If the time coordinates are not identical between the two Datasets
-
aospy.utils.times.
average_time_bounds
(ds)[source]¶ Return the average of each set of time bounds in the Dataset.
Useful for creating a new time array to replace the Dataset’s native time array, in the case that the latter matches either the start or end bounds. This can cause errors in grouping (akin to an off-by-one error) if the timesteps span e.g. one full month each. Note that the Dataset’s times must not have already undergone “CF decoding”, wherein they are converted from floats using the ‘units’ attribute into datetime objects.
Parameters: ds : xarray.Dataset
A Dataset containing a time bounds array with name matching internal_names.TIME_BOUNDS_STR. This time bounds array must have two dimensions, one of which’s coordinates is the Dataset’s time array, and the other is length-2.
Returns: xarray.DataArray
The mean of the start and end times of each timestep in the original Dataset.
Raises: ValueError
If the time bounds array doesn’t match the shape specified above.
-
aospy.utils.times.
convert_scalar_to_indexable_coord
(scalar_da)[source]¶ Convert a scalar coordinate to an indexable one.
In xarray, scalar coordinates cannot be indexed. This converts a scalar coordinate-containing
DataArray
to one that can be indexed usingda.sel
andda.isel
.Parameters: scalar_da : DataArray
Must contain a scalar coordinate
Returns: DataArray
-
aospy.utils.times.
datetime_or_default
(date, default)[source]¶ Return a datetime.datetime object or a default.
Parameters: date : None or datetime-like object
default : The value to return if date is None
Returns: default if date is None, otherwise returns the result of
utils.times.ensure_datetime(date)
-
aospy.utils.times.
ensure_datetime
(obj)[source]¶ Return the object if it is of type datetime.datetime; else raise.
Parameters: obj : Object to be tested. Returns: The original object if it is a datetime.datetime object. Raises: TypeError if `obj` is not of type `datetime.datetime`.
-
aospy.utils.times.
ensure_time_as_dim
(ds)[source]¶ Ensures that time is an indexable dimension on relevant quantites
In xarray, scalar coordinates cannot be indexed. We rely on indexing in the time dimension throughout the code; therefore we need this helper method to (if needed) convert a scalar time coordinate to a dimension.
Note that this must be applied before CF-conventions are decoded; otherwise it casts
np.datetime64[ns]
asint
values.Parameters: ds : Dataset
Dataset with a time coordinate
Returns: Dataset
-
aospy.utils.times.
ensure_time_avg_has_cf_metadata
(ds)[source]¶ Add time interval length and bounds coordinates for time avg data.
If the Dataset or DataArray contains time average data, enforce that there are coordinates that track the lower and upper bounds of the time intervals, and that there is a coordinate that tracks the amount of time per time average interval.
CF conventions require that a quantity stored as time averages over time intervals must have time and time_bounds coordinates [R1]. aospy further requires AVERAGE_DT for time average data, for accurate time-weighted averages, which can be inferred from the CF-required time_bounds coordinate if needed. This step should be done prior to decoding CF metadata with xarray to ensure proper computed timedeltas for different calendar types.
[R1] http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_data_representative_of_cells Parameters: ds : Dataset or DataArray
Input data
Returns: Dataset or DataArray
Time average metadata attributes added if needed.
-
aospy.utils.times.
extract_months
(time, months)[source]¶ Extract times within specified months of the year.
Parameters: time : xarray.DataArray
Array of times that can be represented by numpy.datetime64 objects (i.e. the year is between 1678 and 2262).
months : Desired months of the year to include
Returns: xarray.DataArray of the desired times
-
aospy.utils.times.
month_indices
(months)[source]¶ Convert string labels for months to integer indices.
Parameters: months : str, int
If int, number of the desired month, where January=1, February=2, etc. If str, must match either ‘ann’ or some subset of ‘jfmamjjasond’. If ‘ann’, use all months. Otherwise, use the specified months.
Returns: np.ndarray of integers corresponding to desired month indices
Raises: TypeError : If months is not an int or str
See also
_month_conditional
-
aospy.utils.times.
monthly_mean_at_each_ind
(monthly_means, sub_monthly_timeseries)[source]¶ Copy monthly mean over each time index in that month.
Parameters: monthly_means : xarray.DataArray
array of monthly means
sub_monthly_timeseries : xarray.DataArray
array of a timeseries at sub-monthly time resolution
Returns: xarray.DataArray with eath monthly mean value from monthly_means repeated
at each time within that month from sub_monthly_timeseries
See also
monthly_mean_ts
- Create timeseries of monthly mean values
-
aospy.utils.times.
monthly_mean_ts
(arr)[source]¶ Convert a sub-monthly time-series into one of monthly means.
Also drops any months with no data in the original DataArray.
Parameters: arr : xarray.DataArray
Timeseries of sub-monthly temporal resolution data
Returns: xarray.DataArray
Array resampled to comprise monthly means
See also
monthly_mean_at_each_ind
- Copy monthly means to each submonthly time
-
aospy.utils.times.
numpy_datetime_range_workaround
(date, min_year, max_year)[source]¶ Reset a date to earliest allowable year if outside of valid range.
Hack to address np.datetime64, and therefore pandas and xarray, not supporting dates outside the range 1677-09-21 and 2262-04-11 due to nanosecond precision. See e.g. https://github.com/spencerahill/aospy/issues/96.
Parameters: date : datetime.datetime object
min_year : int
Minimum year in the raw decoded dataset
max_year : int
Maximum year in the raw decoded dataset
Returns: datetime.datetime object
Original datetime.datetime object if the original date is within the permissible dates, otherwise a datetime.datetime object with the year offset to the earliest allowable year.
-
aospy.utils.times.
numpy_datetime_workaround_encode_cf
(ds)[source]¶ Generate CF-compliant units for out-of-range dates.
Hack to address np.datetime64, and therefore pandas and xarray, not supporting dates outside the range 1677-09-21 and 2262-04-11 due to nanosecond precision. See e.g. https://github.com/spencerahill/aospy/issues/96.
Specifically, we coerce the data such that, when decoded, the earliest value starts in 1678 but with its month, day, and shorter timescales (hours, minutes, seconds, etc.) intact and with the time-spacing between values intact.
Parameters: ds : xarray.Dataset
Returns: xarray.Dataset, int, int
Dataset with time units adjusted as needed, minimum year in loaded data, and maximum year in loaded data.
-
aospy.utils.times.
sel_time
(da, start_date, end_date)[source]¶ Subset a DataArray or Dataset for a given date range.
Ensures that data are present for full extent of requested range. Appends start and end date of the subset to the DataArray.
Parameters: da : DataArray or Dataset
data to subset
start_date : np.datetime64
start of date interval
end_date : np.datetime64
end of date interval
Returns: da : DataArray or Dataset
subsetted data
Raises: AssertionError
if data for requested range do not exist for part or all of requested range
-
aospy.utils.times.
yearly_average
(arr, dt)[source]¶ Average a sub-yearly time-series over each year.
Resulting timeseries comprises one value for each year in which the original array had valid data. Accounts for (i.e. ignores) masked values in original data when computing the annual averages.
Parameters: arr : xarray.DataArray
The array to be averaged
dt : xarray.DataArray
Array of the duration of each timestep
Returns: xarray.DataArray
Has the same shape and mask as the original
arr
, except for the time dimension, which is truncated to one value for each year thatarr
spanned
utils.vertcoord¶
Utility functions for dealing with vertical coordinates.
-
aospy.utils.vertcoord.
d_deta_from_pfull
(arr)[source]¶ Compute $partial/partialeta$ of the array on full hybrid levels.
$eta$ is the model vertical coordinate, and its value is assumed to simply increment by 1 from 0 at the surface upwards. The data to be differenced is assumed to be defined at full pressure levels.
Parameters: arr : xarray.DataArray containing the ‘pfull’ dim
Returns: deriv : xarray.DataArray with the derivative along ‘pfull’ computed via
2nd order centered differencing.
-
aospy.utils.vertcoord.
d_deta_from_phalf
(arr, pfull_coord)[source]¶ Compute pressure level thickness from half level pressures.
-
aospy.utils.vertcoord.
does_coord_increase_w_index
(arr)[source]¶ Determine if the array values increase with the index.
Useful, e.g., for pressure, which sometimes is indexed surface to TOA and sometimes the opposite.
-
aospy.utils.vertcoord.
dp_from_p
(p, ps, p_top=0.0, p_bot=110000.0)[source]¶ Get level thickness of pressure data, incorporating surface pressure.
Level edges are defined as halfway between the levels, as well as the user- specified uppermost and lowermost values. The dp of levels whose bottom pressure is less than the surface pressure is not changed by ps, since they don’t intersect the surface. If ps is in between a level’s top and bottom pressures, then its dp becomes the pressure difference between its top and ps. If ps is less than a level’s top and bottom pressures, then that level is underground and its values are masked.
Note that postprocessing routines (e.g. at GFDL) typically mask out data wherever the surface pressure is less than the level’s given value, not the level’s upper edge. This masks out more levels than the
-
aospy.utils.vertcoord.
dp_from_ps
(bk, pk, ps, pfull_coord)[source]¶ Compute pressure level thickness from surface pressure
-
aospy.utils.vertcoord.
get_dim_name
(arr, names)[source]¶ Determine if an object has an attribute name matching a given list.
-
aospy.utils.vertcoord.
integrate
(arr, ddim, dim=False, is_pressure=False)[source]¶ Integrate along the given dimension.
-
aospy.utils.vertcoord.
level_thickness
(p, p_top=0.0, p_bot=101325.0)[source]¶ Calculates the thickness, in Pa, of each pressure level.
Assumes that the pressure values given are at the center of that model level, except for the lowest value (typically 1000 hPa), which is the bottom boundary. The uppermost level extends to 0 hPa.
Unlike dp_from_p, this does not incorporate the surface pressure.
-
aospy.utils.vertcoord.
pfull_from_ps
(bk, pk, ps, pfull_coord)[source]¶ Compute pressure at full levels from surface pressure.
-
aospy.utils.vertcoord.
phalf_from_ps
(bk, pk, ps)[source]¶ Compute pressure of half levels of hybrid sigma-pressure coordinates.
-
aospy.utils.vertcoord.
replace_coord
(arr, old_dim, new_dim, new_coord)[source]¶ Replace a coordinate with new one; new and old must have same shape.
-
aospy.utils.vertcoord.
to_pascal
(arr, is_dp=False)[source]¶ Force data with units either hPa or Pa to be in Pa.
-
aospy.utils.vertcoord.
to_pfull_from_phalf
(arr, pfull_coord)[source]¶ Compute data at full pressure levels from values at half levels.
-
aospy.utils.vertcoord.
to_phalf_from_pfull
(arr, val_toa=0, val_sfc=0)[source]¶ Compute data at half pressure levels from values at full levels.
Could be the pressure array itself, but it could also be any other data defined at pressure levels. Requires specification of values at surface and top of atmosphere.
See also¶
- Spencer Hill’s talk on aospy (slides, recorded talk) at the Seventh Symposium on Advances in Modeling and Analysis Using Python, recorded 2017 January 24 as part of the 2017 American Meteorological Society Annual Meeting.
- Our guest post, “What’s needed for the future of AOS python? Tools for automating AOS data analysis and management” on the PyAOS blog.
- Our shout-out in Katy Huff’s PyCon 2017 keynote talk (scroll to 23:35 for the bit about us)
- The xarray package, upon which aospy relies heavily.
Get in touch¶
- Troubleshooting: We are actively seeking new users and are eager to help you get started with aospy! Usage questions, bug reports, and any other correspondence are all welcome and best placed as Issues on our Github repo.
- Our mailing list: join it! Questions, bug reports, and comments are welcome there also.
- Contributing: We are also actively seeking new developers! Please get in touch by opening an Issue or submitting a Pull Request.
License¶
aospy is freely available under the open source Apache License.
History¶
aospy was originally created by Spencer Hill as a means of automating calculations required for his Ph.D. work across many climate models and simulations. Starting in 2014, Spencer Clark became aospy’s second user and developer. The first official release on PyPI was v0.1 on January 24, 2017.