Warning

The eemeter package is under rapid development; we are working quickly toward a stable release. In the mean time, please proceed to use the package, but as you do so, recognize that the API is in flux and the docs might not be up-to-date. Feel free to contribute changes or open issues on github to report bugs, request features, or make suggestions.

The Open Energy Efficiency Meter

This package holds the core methods used by the of the Open Energy Efficiency energy efficiency metering stack. Specifically, the eemeter package abstracts the process of building and evaluating models of energy consumption or generation and of using those to evaluate the effect of energy efficiency interventions at a particular site associated with a particular project.

The eemeter package is only one part of the larger Open Energy Efficiency technology stack. Briefly, the architecture of the stack is as follows:

  • eemeter: Given project and energy data, the eemeter package is responsible for creating models of energy usage under different project conditions, and for using those models to evaluate energy efficiency projects.
  • datastore: The datastore application is responsible for validating and storing project data and associated energy data, for using the eemeter to evaluate the effectiveness of these projects using the data it stores, and for storing and serving those results. It exposes as REST API for handling these functions.
  • etl: The etl package provides tooling which helps to extract data from various formats, transform that data into the format accepted by datastore, and load that transformed data into the appropriate datastore instance. ETL stands for Extract, Transform, Load.

Usage

Guides

Introduction

The OpenEEmeter is an open source software package that uses metered energy data to manage aggregate demand capacity across a portfolio of retail customer accounts. The software package consists of three main parts:

  1. an Extract-Transform-Load (ETL) toolkit for processing project, energy, and building data (https://github.com/openeemeter/etl/);
  2. a core calculation library (this package) that implements standardized methods (https://github.com/openeemeter/eemeter/); and
  3. a datastore application for storing post-ETL inputs and computed outputs (https://github.com/openeemeter/datastore/).

More information about this architecture can be found in Architecture Overview.

Core use cases

The OpenEEmeter has been designed specifically to provide weather-normalized energy savings measurements for a portfolio of projects using monthly billing data or interval smart meter data. The main outputs for this core use case are project and portfolio-level are:

  • Gross Energy Savings
  • Annualized Energy Savings
  • Realization Rate (when savings predictions are available)

More information about these methods can be found in Methods Overview.

Other potential use cases

The OpenEEmeter can also be configured to manage energy resources across a portfolio of buildings, including potentially:

  • Analytics of raw energy data
  • Portfolio management
  • Demand side resource management
Data requirements

The EEmeter requires a combination of trace data, project data, and weather data to calculate weather-normalized savings. At its most rudimentary, the EEmeter requires a trace of consumption data along with project data indicating the completion date and location of the project.

The completion of a project demarcates the shift between a baseline modeling period and a reporting modeling period. For more information on this, see Methods Overview.

The EEmeter is configured to manage project and trace data. Trace data can be electricity, natural gas, or solar photovoltaic data of any frequency - from monthly billing data to high-frequency sensor data (see 1) Meters and Smart Meters - where does energy data come from?).

Where project and trace data originate from different database sources, a common key must be available to link projects with their respective traces.

Project data

Project data is typically a set of attributes that can be used for advanced savings analytics, but at minimum must contain a date to demarcate start and end of intervention periods.

Each project must have, at minimum:

  • a unique project id
  • start and end dates of known interventions
  • a ZIP code (for gathering associated weather data)
  • a set of associated traces

Other data can also be associated with projects, including (but not limited to):

  • savings predictions
  • square footage
  • cost
Trace data

Each trace must have, at minimum,

  • a link to a project id
  • a unique id of its own
  • an interpretation
  • a set of records

Each record within a trace must have:

  • a time period (start and end dates)
  • a value and assiciated units of
  • a boolean “estimated” flag

The EEmeter will reject traces not meeting built-in data sufficiency requirements.

Loading data

The eemeter python package is a calculation engine which is not desigend for data storage. Instead, project and trace data are stored in the datastore alongside outputs from the eemeter.

To load data into the datastore, EEMeter comes bundled with an ETL Toolkit. If you are deploying the open source software, you will need to write or customize a parser to load your data into the ETL pipeline. We rely on a python module called luigi to manage the bulk importation of data.

More on this architecture.

External analysis

You may decide that you want to use EEmeter results to analyze project data that does not get parsed and uploaded into the datastore. We have made it easy to export your EEmeter results through an API or through a web interface. Other options include a direct database connection to a BI tool like Tableau or Salesforce.

Background

1) Meters and Smart Meters - where does energy data come from?

Energy data is generated by hardware devices that measure electricity and natural gas flow. A device like this is generally referred to as a “meter” (though this is distinct from the software-based “EEmeter” - see Methods Overview). The most common and ubiquitous measuring device is a utility-owned meter used for determining billing. Some utilities have upgraded their meters to provide hourly or 15-minute interval measurements. These so-called “smart meters” use Advanced Metering Infrastructure (AMI) to transmit data back to utilities for processing in near-real time. Other devices that generate energy data include sub-meters, external sensors, and embedded sensors.

Note

The “smart” in smart meter can be a bit of a misnomer. Despite higher measurement frequency and wireless data transmission, these smart meters collect essentially the same data that electricity meters did in the 1950s. Each meter datapoint consists of a timestamp and an incremental value of consumption. We call this string of data characterized by paired sets of timestamps and meter readings a trace. Traces form the basis of the energy modeling in the EEmeter.

Just like the odometer in your car doesn’t tell you how fast you are traveling, the meter on your house doesn’t tell you how much energy you have consumed. Consumption must be calculated. In the past, energy companies simply determined your rate of consumption by taking monthly meter readings and calculating the difference. With smart meters, these datapoints can be captured more frequently and with greater precision, allowing for more sophisticated forms of billing.

2) Measuring Energy Savings and the Transition to Demand Side Management

The OpenEEmeter replaces traditional approaches to program-related energy measurement. Utilizing newly available smart meter data, the OpenEEmeter solves the problem of measuring energy savings and opens new doors for managing demand side programs.

Historically, energy savings have been measured in one of three ways. The first (and least costly) approach is to take laboratory measurements of different energy-consuming devices (e.g., light bulbs) and calculate the difference in consumption from one to the next, then estimate the savings over a given period of time, taking into consideration typical usage patterns. This first approach is limited by the accuracy and availability of physical models.

The second (and most costly) approach samples consumption data prior to and following an intervention of some sort (e.g., an energy efficiency retrofit), and estimates savings after controlling for building-specific factors like occupancy, temperature, energy intensity, etc. This second approach is limited by low availability of data describing these building-specific factors (thus making it very costly).

A third (post-hoc) approach has recently emerged that takes a population-level sample of similar buildings and compares with a treatment group of buildings that have received an energy efficiency upgrade (or other intervention). This approach assumes that all buildings will be affected equally by exogenous factors, leaving only endogenous factors (i.e., the efficiency upgrade) to account for the energy consumption difference.

In the analog era of traditional meters and monthly bills, efforts to improve energy efficiency emphasized fairly static and permanent changes in consumption. A whole-home retrofit, for example, would reduce energy demand without requiring any additional behavioral or lifestyle changes. A one-time intervention would provide years of benefit, and our metering technology at the time provided a way to measure the performance of these measures.

With the introduction of smart meters, utilities have transitioned from simple efficiency programs to a suite of programs under the umbrella of demand side management (DSM). These new measures fall into three broad categories including time of day, demand, and net metering. The OpenEEmeter expands the programmatic interface of energy efficiency to engage with emergent technologies and market based demand side engagement programs.

3) How the OpenEEmeter is valuable: Baselining, Normalization, and Modeling Energy Use

Smart meter data allows for more complexity in statistical models. Rather than relying on simple regression experiments to normalize energy consumption, analysts can parse the impact of exogenous and endogenous factors independently and iteratively. The notion of baseload energy use can even be disaggregated into multiple demand states. For example, a home will use very little energy when empty, a bit more when occupied, and a large amount when appliances and heating or cooling systems are operating. These demand states can be measured against various sorts of interventions, thus enabling both traditional energy efficiency savings measurements, but also leveraging modern load balancing tools.

The OpenEEmeter calculates energy savings in real time by selecting a sample of consumption data prior to an intervention, weather-normalizing it to establish a baseline, and calculating the difference between projected energy usage and actual energy usage following the intervention. This method maintains the cost-effectiveness of the naive predicted savings approach, the real-world integrity of the building efficiency approach, without sacrificing on time as with the post hoc control group approach.

Architecture Overview

The complete eemeter architecture consists primarily of a datastore application (see datastore), which houses energy and project data, and a data pipeline toolkit (see ETL Toolkit) that helps get data into the datastore.

These two work in tandem to take raw energy data in whatever form it exists and compute energy savings using the eemeter package. The methods and models used within the datastore for computing energy savings are kept in a library package called eemeter, which can also be used independent of the datastore application (see eemeter).

Each of these components are open sourced under an MIT License and can be found on github:

The core calculation engine is separated from the datastore in order to allow easier development of and evaluation of its methods, but this architecture also makes it possible to embed the calculation engine or any of its useful modules (such as the weather module) in other applications.

The data structures in each - the eemeter and the datastore - mirror each other. This simplifies data transfer and eases interpretation of results.

Methods Overview

The EEmeter provides multiple methods for calculating energy savings. All of these methods compare energy demand from a modeled counterfactual pre-intervention baseline to post-intervention energy demand. Some of these methods, including the most conventional, weather normalize energy demand.

These basic methods [1] rely on a modeled relationship between weather patterns and energy demand. The particular models used by the EEmeter are described more precisely in Modeling Overview.

Modeling periods

For any savings calculation, the period of time prior to the start of any interventions taking place as part of a project we term the baseline period. This period is used to establish models of the relationship between energy demand and a set of factors that represent or contribute to end use demand (such as weather, time of day, or day of week) for a particular building prior to an intervention. The baseline becomes a reference point from which to make comparisions to post-intervention energy performance. The baseline period is one of two types of modeling period frequently occurring in the EEmeter.

The second half of the savings calculation concerns what happens after an intervention. Any post-intervention period for which energy savings is calculated is called a reporting period because it is the period of time over which energy savings is reported. A project generally has only one baseline period, but it might have multiple reporting periods. These are the second type of modeling period to frequent occur in the EEmeter.

The extent of these periods will, in most cases, be determined by the start and end dates of the interventions in a project. However, in some cases, the intervention dates are not known, or are ongoing, and must be modeled because they cannot be stated explicitly. We refer to models which account for the latter scenario as structural change models; these are covered in greater detail in Modeling Overview.

EEmeter structures which capture this logic can be found in the API documentation for eemeter.structures.

_images/project-timeline-illustration.png

Pre-intervention baseline period and post-intervention reporting periods on a project timeline.

Trace modeling

The relationship between energy demand and various external factors can differ drastically from building to building, and (usually!) changes after an intervention. Modeling these relationships properly with statistical confidence is a core strength of the EEmeter.

As noted in the background, we term a set of energy data points a trace, and a building or project might be associated with any number of traces. In order to calculate savings models, each of these traces must be modeled.

Before modeling, traces are segmented into components which overlap each baseline and reporting period of interest, then are modeled separately. [2] This creates up to \(n * m\) models for a project with \(n\) traces and \(m\) modeling periods.

Each of these models attempts to establish the relationship between energy demand and external factors as it performed during the particular modeling period of interest. However, since the extent to which a model successfully describes these relationships varies significantly, these must be considered only in conjunction with model error and goodness of fit metrics Modeling Overview. Any estimate of energy demand given by any model fitted by the EEmeter is associated with variance and confidence bounds.

In practice the number of models fitted for any particular project might be fewer than \(n * m\) due to missing or insufficient data (see Data sufficiency). The EEmeter takes these failures into account and considers them when building summaries of savings.

_images/trace-segmenting-illustration.png

An example of trace segmenting with two traces, one baseline period and one reporting period. Trace 1 is segmented into just one component - the baseline component - because data for the reporting period is missing. Trace 2 is segmented into one baseline component and one reporting component. The segments of Trace 1 and Trace 2 have different lengths, but models of their energy demand behavior can still be built.

Weather normalization

Once we have created a model, we can apply that model to determine an estimate of energy demand during arbitrary weather scenarios. The two most common weather scenarios for which the EEmeter will estimate demand are the “normal” weather year and the observed reporting period weather year. This is generally necessary because the data observed in the baseline and reporting periods occurred during different time periods with different weather – and valid comparisons between them must account for this. Estimating energy performance during the “normal” weather attempts to reduce bias in the savings estimate by accounting for the peculiarity (as compared to other years or seasons) of the relevant observed weather.

In an attempt to reduce the number of arbitrary factors influencing results, we only ever compare model estimates or data over that has occurred over the same weather scenario and time period. This helps (in the aggregate) to ensure equivalency of end use demand pre- and post-intervention.

Savings

If the data and models show that energy demand is reduced relative to equivalent end use demand following an intervention, we say that there have been energy savings, or equivalently, that energy performance has increased.

Energy savings is necessarily a difference; however, this difference must be taken carefully, given missing data and model error, and is only taken after the necessary aggregation steps.

The equation for savings is always:

\(S_\text{total} = E_\text{b} - E_\text{r}\)

or

\(S_\text{percent} = \frac{E_\text{b} - E_\text{r}}{E_\text{b}}\)

where

  • \(S_\text{total}\) is aggregate total savings
  • \(S_\text{percent}\) is aggregate percent savings
  • \(E_\text{b}\) is aggregate energy demand as under baseline period conditions
  • \(E_\text{r}\) is aggregate energy demand as under reporting period conditions

Depending on the type of energy savings desired, the values \(E_\text{b}\) and \(E_\text{r}\) may be calculated differently. The following types of savings are supported:

Annualized weather normal

The annualized weather normal estimates savings as it may have occurred during a “normal” weather year. It does this by building models of both the baseline and reporting energy demand and using each to weather-normalize the energy values.

\(E_\text{b} = \text{M}_\text{b}\left(\text{X}_\text{normal}\right)\)

\(E_\text{r} = \text{M}_\text{r}\left(\text{X}_\text{normal}\right)\)

where

  • \(\text{M}_\text{b}\) is the model of energy demand as built using trace data segmented from the baseline period.
  • \(\text{M}_\text{r}\) is the model of energy demand as built using trace data segmented from the reporting period.
  • \(\text{X}_\text{normal}\) are temperature and other covariate values for the weather normal year.
Gross predicted

The gross predicted method estimates savings that have occurred from the completion of the project interventions up to the date of the meter run.

\(E_\text{b} = \text{M}_\text{b}\left(\text{X}_\text{r}\right)\)

\(E_\text{r} = \text{M}_\text{r}\left(\text{X}_\text{r}\right)\)

where

  • \(\text{M}_\text{b}\) is the model of energy demand as built using trace data segmented from the baseline period.
  • \(\text{M}_\text{r}\) is the model of energy demand as built using trace data segmented from the reporting period.
  • \(\text{X}_\text{r}\) are temperature and other covariate values for reporting period.
Gross observed

The gross observed method estimates savings that have occurred from the completion of the project interventions up to the date of the meter run.

\(E_\text{b} = \text{M}_\text{b}\left(\text{X}_\text{r}\right)\)

\(E_\text{r} = \text{A}_\text{r}\)

where

  • \(\text{M}_\text{b}\) is the model of energy demand as built using trace data segmented from the baseline period.
  • \(\text{A}_\text{r}\) are the actual observed energy demand values from the trace data segmented from the baseline period. If the actual data has missing values, these are interpolated using gross predicted values (i.e., \(\text{M}_\text{r}\left(\text{X}_\text{r}\right)\)).
  • \(\text{X}_\text{r}\) are temperature and other covariate values for reporting period.
Aggregation rules

Because even an individual project may have multiple traces describing its energy demand, we must be able to aggregate trace-level results before we can obtain project-level or portfolio-level savings. Ideally, this aggregation is a simple sum of trace-level values. However, trace-level results are often littered with messy results which must be accounted for; some may be missing data, have bad model fits, or have entirely failed model builds. The EEmeter must successfully handle each of these cases, or risk invalidating results for entire portfolios.

The aggregation steps are as follows:

  1. Select scope (project, portfolio) and gather all trace data available in that scope

  2. Select baseline and reporting period. For portfolio level aggregations in which baseline and reporting periods may not align, select reporting period type and use the default baseline period for each project.

  3. Group traces by interpretation

  4. Compute \(E_\text{b}\) and \(E_\text{r}\):

    1. Compute (or retrieve) \(E_\text{t,b}\) and \(E_\text{t,r}\) for each trace \(\text{t}\).
    2. Determine, for each \(E_\text{t,b}\) and \(E_\text{t,r}\) whether or not it meets criteria for inclusion in aggregation.
    3. Discard both \(E_\text{t,b}\) and \(E_\text{t,r}\) for any trace for which either \(E_\text{t,b}\) or \(E_\text{t,r}\) has been discarded.
    4. Compute \(E_\text{b} = \sum_{\text{t}}E_\text{t,b}\) and \(E_\text{r} = \sum_{\text{t}}E_\text{t,r}\) for remaining traces. Errors are propagated according to the principles in Error propagation.
  5. Compute savings from \(E_\text{b}\) and \(E_\text{r}\) as usual.

Inclusion criteria

For inclusion in aggregates, \(E_\text{t,b}\) and \(E_\text{t,r}\) must meet the following criteria

  1. If ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED, which represents solar generation, is available, and if solar panels were installed as one of the project interventions, blank \(E_\text{t,b}\) should be replaced with 0.
  2. Model has been successfully built.
Error propagation

Errors are propagated as if they followed \(\chi^2\) distributions.

Weather data matching

Since weather and temperature data is so central to the activity of the EEmeter, the particulars of how weather data is obtained for a project is often of interest. Weather data sources are determined automatically within the EEmeter using an internal mapping [3] between ZIP codes [4] and weather stations. The source of the weather normal data may differ from the source of the observed weather data.

There is a jupyter notebook outlining the process of constructing the weather data available here.

[1]Additional information on why this method is used in preference to other methods is described in the Introduction.
[2]This is not quite true for structural change models. This is covered in more detail in Modeling Overview.
[3]Available on github.
[4]The ZIP codes used in this mapping aren’t strictly ZIP codes, they’re actually ZCTAs.

Modeling Overview

Basic modeling principles
Model error
Data sufficiency
Types of models
Weekday and Seasonal effects regression model
Hidden markov model

Glossary

  • annualized weather normal: an estimate of annual energy demand under a weather normal.
  • baseline: a pre-intervention reference point or starting point from which to compare post-intervention energy demand.
  • demand response project: a set of interventions designed to shift the time of day or day of week of energy-demand, generally toward off-peak hours.
  • end use: an energy-consuming service such as lighting, space cooling, space heating, refrigeration, or water heating, particularly as provided by a building or set of buildings.
  • end use demand: the extent to which an end use is needed. May vary by season, occupancy, time of day, day of week, or purpose of building.
  • energy demand: the amount of energy needed to satisfy end use demand.
  • energy trace: see trace
  • intervention: a set of upgrades or performance improvements on physical infrastructure of an existing building (see retrofit), or of behavior of individuals living in an existing building.
  • modeling period: a period of time over which an energy model is to be created for a particular trace. This is a generalization of baseline and reporting periods. Modeling periods generally fall into one of those two categories.
  • projected baseline energy demand: a counterfactual estimate of energy demand as it might have been under a particular end use demand scenario had an intervention not occurred.
  • retrofit: a set of interventions taking place at a particular building or site which modify pre-existing structures, installations or appliances.
  • structural change model: a model which takes tries to determine the most probably extents of baseline and reporting periods for a project given its trace data.
  • trace: a single time series of measured values associated with units at a particular (not necessarily fixed) frequency.
  • trace interpretation: the meaning of the trace data. Possible interpretations are outlined in eemeter.structures
  • weather normalization: a technique to account for differences in end use demand due to variations in weather patterns which uses a model of weather-dependent energy demand to determine a counterfactual energy demand under a weather conditions described by a weather normal.
  • weather normal: a set of (not necessarily observed) weather data designed to reflect a “typical” weather scenario. Often covers a time period of 1 year. Used in weather normalization. See TMY3.
  • ZIP Code Tabulation Area (ZCTA): a set of geographical areas based on US Postal Service (USPS) ZIP codes, necessitated by the fact that ZIP codes do not map easily onto geographies. Built and maintained by the US Census Bureau. Contains only about three quarters of valid ZIP codes. ZIP code and ZCTA do not always match. More information.

Why open source?

All of our savings algorithms are free and open source. We don’t believe that standard weights and measures should be the private property of any particular entity. It’s much better for everyone, from contractors to program administrators, if the measurement tools are equally available to everyone.

eemeter

Installation

Note

If you are installing python for the first time, we recommend using Anaconda, a free python distribution with builds for windows, mac os, and linux.

To get started with the eemeter, use pip:

$ pip install eemeter

Make sure you have the latest version:

>>> import eemeter; eemeter.get_version()
'1.3.3'

The eemeter package itself does not use C extensions. However, some eemeter dependencies do. These can be a bit trickier to install. If issues arise when pip installing eemeter, verify that the packges with C extensions are properly installing. Specifically, verify that these installation commands complete without errors:

pip install lxml
pip install numpy

If they fail, please see follow installation instructions for those packages (lxml, numpy).

Some statsmodels installations require numpy to be installed. If you run into errors with the statsmodels installation, be sure numpy is installed before attempting to install statsmodels. Once statsmodels is installed correctly, install eemeter.

Topics

Basic Usage: eemeter package

This tutorial is also available as a jupyter notebook

Note:

Most users of the EEmeter stack do not directly use the eemeter package for loading their data. Instead, they use the datastore application, which uses the eemeter internally. To learn to use the datastore, head over to the datastore basic usage tutorial.

Running a meter

Please download a preformatted input file.

We can load this input file into memory with the following:

In [1]:
import json

with open('meter_input_example.json', 'r') as f:  # modify to point to your downloaded input file.
    meter_input = json.load(f)

The file has a single trace of hourly electricity consumption data and some associated project data. Its contents looks like this:

In [2]:
!head -15 meter_input_example.json
{
  "type": "SINGLE_TRACE_SIMPLE_PROJECT",
  "trace": {
    "type": "ARBITRARY_START",
    "interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
    "unit": "KWH",
    "trace_id": "TRACE_ID_123",
    "interval": "daily",
    "records": [
      {
        "start": "2011-01-01T00:00:00+00:00",
        "value": 57.8,
        "estimated": false
      },
      {
In [3]:
!tail -25 meter_input_example.json
        "estimated": false
      },
      {
        "start": "2015-01-01T00:00:00+00:00",
        "value": null,
        "estimated": false
      }
    ]
  },
  "project": {
    "type": "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP",
    "zipcode": "50321",
    "project_id": "PROJECT_ID_ABC",
    "modeling_period_group": {
      "baseline_period": {
        "start": null,
        "end": "2013-06-01T00:00:00+00:00"
      },
      "reporting_period": {
        "start": "2013-07-01T00:00:00+00:00",
        "end": null
      }
    }
  }
}

Next, we can create a meter, model and formatter. These work in tandem to create a model of energy usage.

The meter coordinates loading the input data, matching it with appropriate weather data, and passing it to the formatter and model. It then uses these to calculate a set of outputs, including energy savings estimates such as annualized weather normalized usage.

The formatter formats the trace and project data for use within the model.

The model fits a model of energy usage to this formatted data which can be used, given covariate weather data, to predict or model energy usage over an arbitrary period of time.

In [4]:
from eemeter.ee.meter import EnergyEfficiencyMeter
from eemeter.modeling.models import CaltrackMonthlyModel
from eemeter.modeling.formatters import ModelDataFormatter

meter = EnergyEfficiencyMeter()
model = (CaltrackMonthlyModel, {"fit_cdd": False, "grid_search": True})
formatter = (ModelDataFormatter, {"freq_str": "D"})

The meter we created is an instance of the EEmeter class which operates on single energy traces.

The model we created is a tuple of (model class, model keyword arguments), not an instantiation of the model. We do it this way to allow easy creation of multiple instances of the model class.

The formatter is, like the model, a tuple of (formatter class, formatter keyword arguments), for the same reason - we want to make multiple instances of the formatter class.

These can be used directly to “evaluate” the meter on the meter input. We’ll store the output in meter_output.

In [5]:
meter_output = meter.evaluate(meter_input, model=model, formatter=formatter)

This meter_ouput is quite verbose, so we’ll export it to a json file which is a bit more readable.

In [6]:
with open('meter_output_example.json', 'w') as f:  # change this path if desired.
    json.dump(meter_output, f, indent=2)

The content of this file will look something like this:

In [7]:
!head -40 meter_output_example.json
{
  "status": "SUCCESS",
  "failure_message": null,
  "logs": [
    "Using weather_source ISDWeatherSource(\"725460\")",
    "Using weather_normal_source TMY3WeatherSource(\"725460\")"
  ],
  "eemeter_version": "0.5.3",
  "model_class": "CaltrackMonthlyModel",
  "model_kwargs": {
    "fit_cdd": false,
    "grid_search": true
  },
  "formatter_class": "ModelDataFormatter",
  "formatter_kwargs": {
    "freq_str": "D"
  },
  "weather_source_station": "725460",
  "weather_normal_source_station": "725460",
  "derivatives": [
    {
      "modeling_period_group": [
        "baseline",
        "reporting"
      ],
      "series": "Cumulative baseline model minus reporting model, normal year",
      "description": "Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed.",
      "orderable": [
        null
      ],
      "value": [
        2479.015638036155
      ],
      "variance": [
        7354.084609086982
      ]
    },
    {
      "modeling_period_group": [
        "baseline",

Note how this file is organized: it contains a summary of the operations done during meter execution, including everything necessary to recreate the meter run, like the model class and keyword arguments used to initialize it, and the weather data (degrees F, called “demand_fixture”) that was used in model building.

Not everyone has data ready to go, so if you are in that bucket, the next section covers how you can get started with data of your own.

Data preparation

All we’ll be doing in this section is creating a data structure that has the same format as meter_input_example.json file above. We are using the eemeter EnergyTrace helper structure.

Of course, this is not the only way to get data into the necessary format; use this for inspiration, but make changes as necessary to accomodate the particulars of your dataset.

In [8]:
# library imports
from eemeter.structures import EnergyTrace
from eemeter.io.serializers import ArbitraryStartSerializer
from eemeter.ee.meter import EnergyEfficiencyMeter
import pandas as pd
import pytz

First, we import the energy data from the sample CSV and transform it into records

In [9]:
energy_data = pd.read_csv('sample-energy-data_project-ABC_zipcode-50321.csv',
                          parse_dates=['date'], dtype={'zipcode': str})
records = [{
    "start": pytz.UTC.localize(row.date.to_datetime()),
    "value": row.value,
    "estimated": row.estimated,
} for _, row in energy_data.iterrows()]

The records we created look like this:

In [10]:
records[:3]  # the first three records
Out[10]:
[{'estimated': False,
  'start': datetime.datetime(2011, 1, 1, 0, 0, tzinfo=<UTC>),
  'value': 57.8},
 {'estimated': False,
  'start': datetime.datetime(2011, 1, 2, 0, 0, tzinfo=<UTC>),
  'value': 64.8},
 {'estimated': False,
  'start': datetime.datetime(2011, 1, 3, 0, 0, tzinfo=<UTC>),
  'value': 49.5}]

Next, we load our records into an EnergyTrace. We give it units "KWH" and interpretation "ELECTRICITY_CONSUMPTION_SUPPLIED", which means that this is electricity consumed by the building and supplied by a utility (rather than by solar panels or other on-site generation). We also pass in an instance of the record serializer ArbitraryStartSerializer to show it how to interpret the records.

In [11]:
energy_trace = EnergyTrace(
    records=records,
    unit="KWH",
    interpretation="ELECTRICITY_CONSUMPTION_SUPPLIED",
    serializer=ArbitraryStartSerializer(),
    trace_id='TRACE_ID_123',
    interval='daily'
)

The energy trace data we created looks like this:

In [12]:
energy_trace.data[:3]  # first three records
Out[12]:
value estimated
2011-01-01 00:00:00+00:00 57.8 False
2011-01-02 00:00:00+00:00 64.8 False
2011-01-03 00:00:00+00:00 49.5 False

Now we load the rest of the project data from the sample project data CSV. This CSV includes the project_id (we don’t use it in this tutorial, but this is how you might identify the saved meter results), the ZIP code of the building, and the dates retrofit work for this project started and completed.

In [13]:
project_data = pd.read_csv('sample-project-data.csv',
                           parse_dates=['retrofit_start_date', 'retrofit_end_date']).iloc[0]

Here’s what our project data looks like.

In [14]:
project_data
Out[14]:
project_id                             ABC
zipcode                              50321
retrofit_start_date    2013-06-01 00:00:00
retrofit_end_date      2013-07-01 00:00:00
Name: 0, dtype: object
In [15]:
zipcode = "{:05d}".format(project_data.zipcode)
retrofit_start_date = pytz.UTC.localize(project_data.retrofit_start_date)
retrofit_end_date = pytz.UTC.localize(project_data.retrofit_end_date)

Here’s an example of how to get this data into the format the meter expects (exactly the format of the meter_input_example.json from above).

In [16]:
from collections import OrderedDict

def serialize_meter_input(trace, zipcode, retrofit_start_date, retrofit_end_date):

    data = OrderedDict([
        ("type", "SINGLE_TRACE_SIMPLE_PROJECT"),
        ("trace", trace_serializer(trace)),
        ("project", project_serializer(zipcode, retrofit_start_date, retrofit_end_date)),
    ])

    return data


def trace_serializer(trace):
    data = OrderedDict([
        ("type", "ARBITRARY_START"),
        ("interpretation", trace.interpretation),
        ("unit", trace.unit),
        ("trace_id", trace.trace_id),
        ("interval", trace.interval),
        ("records", [
            OrderedDict([
                ("start", start.isoformat()),
                ("value", record.value if pd.notnull(record.value) else None),
                ("estimated", bool(record.estimated)),
            ])
            for start, record in trace.data.iterrows()
        ]),
    ])
    return data


def project_serializer(zipcode, retrofit_start_date, retrofit_end_date):
    data = OrderedDict([
        ("type", "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP"),
        ("zipcode", zipcode),
        ("project_id", 'PROJECT_ID_ABC'),
        ("modeling_period_group", OrderedDict([
            ("baseline_period", OrderedDict([
                ("start", None),
                ("end", retrofit_start_date.isoformat()),
            ])),
            ("reporting_period", OrderedDict([
                ("start", retrofit_end_date.isoformat()),
                ("end", None),
            ]))
        ]))
    ])
    return data
In [17]:
my_meter_input = serialize_meter_input(
    energy_trace, zipcode, retrofit_start_date, retrofit_end_date)
In [18]:
with open('my_meter_input.json', 'w') as f:
    json.dump(my_meter_input, f, indent=2)
In [19]:
!head -15 my_meter_input.json
{
  "type": "SINGLE_TRACE_SIMPLE_PROJECT",
  "trace": {
    "type": "ARBITRARY_START",
    "interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
    "unit": "KWH",
    "trace_id": "TRACE_ID_123",
    "interval": "daily",
    "records": [
      {
        "start": "2011-01-01T00:00:00+00:00",
        "value": 57.8,
        "estimated": false
      },
      {
In [20]:
!tail -25 my_meter_input.json
        "estimated": false
      },
      {
        "start": "2015-01-01T00:00:00+00:00",
        "value": null,
        "estimated": false
      }
    ]
  },
  "project": {
    "type": "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP",
    "zipcode": "50321",
    "project_id": "PROJECT_ID_ABC",
    "modeling_period_group": {
      "baseline_period": {
        "start": null,
        "end": "2013-06-01T00:00:00+00:00"
      },
      "reporting_period": {
        "start": "2013-07-01T00:00:00+00:00",
        "end": null
      }
    }
  }
}

Now we can run this through the meter exactly the same way we did before:

In [21]:
my_meter_output = meter.evaluate(my_meter_input, model=model, formatter=formatter)
Inspecting results

Now that we have some results at our fingertips, let’s inspect them. We’ll be using the meter output from the first example trace.

The output is mostly made up of a set of “derivatives”. These aren’t derivatives in the calculus sense - they’re just derived from the model output.

Let’s take a look at the first one.

In [22]:
derivative = meter_output["derivatives"][0]

We can take a peek at the contents by looking at the keys of the dict.

In [23]:
[k for k in derivative.keys()]
Out[23]:
['modeling_period_group',
 'series',
 'description',
 'orderable',
 'value',
 'variance']

Each derivative is a series with a name and a description

In [24]:
derivative['series'], derivative['description']
Out[24]:
('Cumulative baseline model minus reporting model, normal year',
 'Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed.')

The values associated with the derivative are stored in value, their variances are stored in variance, and the orderables act as keys. A single orderable of None indicates (as in this case) that the value and variance are singleton values.

In [25]:
derivative['orderable'], derivative['value'], derivative['variance']
Out[25]:
([None], [2479.015638036155], [7354.0846090869818])

Other derivatives are computed as well:

In [26]:
print(json.dumps([(d['series'], d['description']) for d in sorted(meter_output["derivatives"], key=lambda o: o['series'])], indent=2))
[
  [
    "Baseline model minus observed, reporting period",
    "Predicted usage according to the baseline model minus observed usage over the reporting period."
  ],
  [
    "Baseline model minus reporting model, normal year",
    "Predicted usage according to the baseline model over the normal weather year, minus the predicted usage according to the reporting model over the normal weather year."
  ],
  [
    "Baseline model, baseline period",
    "Predicted usage according to the baseline model over the baseline period."
  ],
  [
    "Baseline model, normal year",
    "Predicted usage according to the baseline model over the normal weather year."
  ],
  [
    "Baseline model, reporting period",
    "Predicted usage according to the baseline model over the reporting period."
  ],
  [
    "Cumulative baseline model minus observed, reporting period",
    "Total predicted usage according to the baseline model minus observed usage over the reporting period. Days for which reporting period weather data or usage do not exist are removed."
  ],
  [
    "Cumulative baseline model minus reporting model, normal year",
    "Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed."
  ],
  [
    "Cumulative baseline model, normal year",
    "Total predicted usage according to the baseline model over the normal weather year. Days for which normal year weather data does not exist are removed."
  ],
  [
    "Cumulative baseline model, reporting period",
    "Total predicted usage according to the baseline model over the reporting period. Days for which reporting period weather data does not exist are removed."
  ],
  [
    "Cumulative observed, baseline period",
    "Total observed usage over the baseline period. Days for which weather data does not exist are NOT removed."
  ],
  [
    "Cumulative observed, reporting period",
    "Total observed usage over the reporting period. Days for which weather data does not exist are NOT removed."
  ],
  [
    "Cumulative reporting model, normal year",
    "Total predicted usage according to the reporting model over the reporting period. Days for which normal year weather data does not exist are removed."
  ],
  [
    "Inclusion mask, baseline period",
    "Mask for baseline period data which is included in model and savings cumulatives."
  ],
  [
    "Inclusion mask, reporting period",
    "Mask for reporting period data which is included in model and savings cumulatives."
  ],
  [
    "Observed, baseline period",
    "Observed usage over the baseline period."
  ],
  [
    "Observed, project period",
    "Observed usage over the project period."
  ],
  [
    "Observed, reporting period",
    "Observed usage over the reporting period."
  ],
  [
    "Reporting model, normal year",
    "Predicted usage according to the reporting model over the reporting period."
  ],
  [
    "Reporting model, reporting period",
    "Predicted usage according to the reporting model over the reporting period."
  ],
  [
    "Temperature, baseline period",
    "Observed temperature (degF) over the baseline period."
  ],
  [
    "Temperature, normal year",
    "Observed temperature (degF) over the normal year."
  ],
  [
    "Temperature, reporting period",
    "Observed temperature (degF) over the reporting period."
  ]
]
Weather Data Caching

In order to avoid putting an unnecessary load on external weather sources, weather data is cached by default using json in a directory ~/.eemeter/cache. The location of the directory can be changed by setting:

$ export EEMETER_WEATHER_CACHE_DIRECTORY=<full path to directory>

API

eemeter.ee
eemeter.ee.meter
class eemeter.ee.meter.EnergyEfficiencyMeter(**kwargs)[source]

Meter for determining energy efficiency derivatives for a single traces.

Parameters:default_model_mapping (dict) – mapping between (interpretation, frequency) tuples used to select the default model (if none is explicitly provided in .evaluate()).
evaluate(meter_input, formatter=None, model=None, weather_source=None, weather_normal_source=None)[source]

Main entry point to the meter, which models traces and calculates derivatives.

Parameters:
  • meter_input (dict) – Serialized input containing trace and project data.
  • formatter (tuple of (class, dict), default None) – Formatter for trace and weather data. Used to create input for model. If None is provided, will be auto-matched to appropriate default formatter. Class name can be provided as a string (class.__name__) or object.
  • model (tuple of (class, dict), default None) – Model to use in modeling. If None is provided, will be auto-matched to appropriate default model. Class can be provided as a string (class.__name__) or class object.
  • weather_source (eemeter.weather.WeatherSource) – Weather source to be used for this meter. Overrides weather source found using project.site. Useful for test mocking.
  • weather_normal_source (eemeter.weather.WeatherSource) – Weather normal source to be used for this meter. Overrides weather source found using project.site. Useful for test mocking.
Returns:

results – Dictionary of results with the following keys:

  • "status": SUCCESS/FAILURE
  • "failure_message": if FAILURE, message indicates reason for failure, may include traceback
  • "logs": list of collected log messages
  • "model_class": Name of model class
  • "model_kwargs": dict of model keyword arguments (settings)
  • "formatter_class": Name of formatter class
  • "formatter_kwargs": dict of formatter keyword arguments (settings)
  • "eemeter_version": version of the eemeter package
  • "modeled_energy_trace": modeled energy trace
  • "derivatives": derivatives for each interpretation
  • "weather_source_station": Matched weather source station.
  • "weather_normal_source_station": Matched weather normal source station.

Return type:

dict

eemeter.io
eemeter.io.serializers
class eemeter.io.serializers.ArbitrarySerializer(parse_dates=False)[source]

Arbitrary data at arbitrary non-overlapping intervals. Often used for montly billing data. Records must all have the “start” key and the “end” key. Overlaps are not allowed and gaps will be filled with NaN.

For example:

>>> records = [
...     {
...         "start": datetime(2013, 12, 30, tzinfo=pytz.utc),
...         "end": datetime(2014, 1, 28, tzinfo=pytz.utc),
...         "value": 1180,
...     },
...     {
...         "start": datetime(2014, 1, 28, tzinfo=pytz.utc),
...         "end": datetime(2014, 2, 27, tzinfo=pytz.utc),
...         "value": 1211,
...         "estimated": True,
...     },
...     {
...         "start": datetime(2014, 2, 28, tzinfo=pytz.utc),
...         "end": datetime(2014, 3, 30, tzinfo=pytz.utc),
...         "value": 985,
...     },
... ]
...
>>> serializer = ArbitrarySerializer()
>>> df = serializer.to_dataframe(records)
>>> df
                            value estimated
2013-12-30 00:00:00+00:00  1180.0     False
2014-01-28 00:00:00+00:00  1211.0      True
2014-02-27 00:00:00+00:00     NaN     False
2014-02-28 00:00:00+00:00   985.0     False
2014-03-30 00:00:00+00:00     NaN     False
yield_records(sorted_records)[source]

Yields validated (start (datetime), value (float), estimated (bool)) tuples of data.

class eemeter.io.serializers.ArbitraryStartSerializer(parse_dates=False)[source]

Arbitrary start data at arbitrary non-overlapping intervals. Records must all have the “start” key. The last data point will be ignored unless an end date is provided for it. This is useful for data dated to future energy use, e.g. billing for delivered fuels.

For example:

>>> records = [
...     {
...         "start": datetime(2013, 12, 30, tzinfo=pytz.utc),
...         "value": 1180,
...     },
...     {
...         "start": datetime(2014, 1, 28, tzinfo=pytz.utc),
...         "value": 1211,
...         "estimated": True,
...     },
...     {
...         "start": datetime(2014, 2, 28, tzinfo=pytz.utc),
...         "value": 985,
...     },
... ]
...
>>> serializer = ArbitrarySerializer()
>>> df = serializer.to_dataframe(records)
>>> df
                            value estimated
2013-12-30 00:00:00+00:00  1180.0     False
2014-01-28 00:00:00+00:00  1211.0      True
2014-02-28 00:00:00+00:00     NaN     False
yield_records(sorted_records)[source]

Yields validated (start (datetime), value (float), estimated (bool)) tuples of data.

class eemeter.io.serializers.ArbitraryEndSerializer(parse_dates=False)[source]

Arbitrary end data at arbitrary non-overlapping intervals. Records must all have the “end” key. The first data point will be ignored unless a start date is provided for it. This is useful for data dated to past energy use, e.g. electricity or natural gas bills.

For example:

>>> records = [
...     {
...         "end": datetime(2013, 12, 30, tzinfo=pytz.utc),
...         "value": 1180,
...     },
...     {
...         "end": datetime(2014, 1, 28, tzinfo=pytz.utc),
...         "value": 1211,
...         "estimated": True,
...     },
...     {
...         "end": datetime(2014, 2, 28, tzinfo=pytz.utc),
...         "value": 985,
...     },
... ]
...
>>> serializer = ArbitrarySerializer()
>>> df = serializer.to_dataframe(records)
>>> df
                            value estimated
2013-12-30 00:00:00+00:00  1211.0      True
2014-01-28 00:00:00+00:00   985.0     False
2014-02-28 00:00:00+00:00     NaN     False
yield_records(sorted_records)[source]

Yields validated (start (datetime), value (float), estimated (bool)) tuples of data.

eemeter.io.parsers
class eemeter.io.parsers.ESPIUsageParser(xml)[source]

Parse ESPI XML files.

Basic usage:

>>> from eemeter.io.parsers import ESPIUsageParser
>>> with open("/path/to/example.xml") as f:
...     parser = ESPIUsageParser(f)
>>> energy_traces = list(parser.get_energy_traces())
Parameters:xml (str, filepath, file buffer) – XML data to parse
get_energy_traces(service_kind_default='electricity')[source]

Retrieve all energy trace records stored as IntervalReading elements in the given ESPI Energy Usage XML.

Energy records are grouped by interpretation and returned in EnergyTrace objects.

Parameters:service_kind_default (str) – Default fuel type to use in parser if ReadingType/commodity field is missing.
Yields:energy_trace (eemeter.structures.EnergyTrace) – Energy data traces as described in the xml file.
has_solar()[source]

Returns True if there is a “reverse” flow direction in this file, indicating presence of solar photo voltaics.

TODO: Verify that this is the correct way to determine this - are there false positives or false negatives? Is there a more straightforward flag to use somewhere else?

eemeter.modeling
eemeter.modeling.formatters

The formatter classes are designed to provide a standard interface to model fit and predict methods. The formatters add weather data to daily or monthly energy data. The interface assumes that the model class will be responsible for applying data sufficiency rules and additional formatting necessary for performing model fits or predictions.

class eemeter.modeling.formatters.ModelDataFormatter(freq_str)[source]

Formatter for model data of known or predictable frequency. Basic usage:

>>> formatter = ModelDataFormatter("D")
>>> formatter.create_input(energy_trace, weather_source)
                           energy tempF
2013-06-01 00:00:00+00:00    3.10  74.3
2013-06-02 00:00:00+00:00    2.42  71.0
2013-06-03 00:00:00+00:00    1.38  73.1
                                   ...
2016-05-27 00:00:00+00:00    0.11  71.1
2016-05-28 00:00:00+00:00    0.04  78.1
2016-05-29 00:00:00+00:00    0.21  69.6
>>> index = pd.date_range('2013-01-01', periods=365, freq='D')
>>> formatter.create_input(index, weather_source)
                           tempF
2013-01-01 00:00:00+00:00   28.3
2013-01-02 00:00:00+00:00   31.0
2013-01-03 00:00:00+00:00   34.1
                            ...
2013-12-29 00:00:00+00:00   12.3
2013-12-30 00:00:00+00:00   26.0
2013-12-31 00:00:00+00:00   24.1
create_demand_fixture(index, weather_source)[source]

Creates a DatetimeIndex ed dataframe containing formatted demand fixture data.

Parameters:
  • index (pandas.DatetimeIndex) – The desired index for demand fixture data.
  • weather_source (eemeter.weather.WeatherSourceBase) – The source of weather fixture data.
Returns:

input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.predict() methods.

Return type:

pandas.DataFrame

create_input(trace, weather_source)[source]

Creates a DatetimeIndex ed dataframe containing formatted model input data formatted as follows.

Parameters:
  • trace (eemeter.structures.EnergyTrace) – The source of energy data for inclusion in model input.
  • weather_source (eemeter.weather.WeatherSourceBase) – The source of weather data.
Returns:

input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.fit() methods.

Return type:

pandas.DataFrame

daily_trace_data(trace)[source]

Transforms a trace for this formatter to a daily series

get_input_data_mask(input_data)[source]

Boolean list of missing/not missing values: True => missing False => not missing

hourly_trace_data(trace)[source]

Transforms a trace for this formatter to an hourly series

serialize_demand_fixture(demand_fixture_data)[source]

Serialize demand fixture data

serialize_input(input_data)[source]

Serialize input data

class eemeter.modeling.formatters.ModelDataBillingFormatter[source]

Formatter for model data of unknown or unpredictable frequency. Basic usage:

>>> formatter = ModelDataBillingFormatter()
>>> energy_trace = EnergyTrace(
        "ELECTRICITY_CONSUMPTION_SUPPLIED",
        pd.DataFrame(
            {
                "value": [1, 1, 1, 1, np.nan],
                "estimated": [False, False, True, False, False]
            },
            index=[
                datetime(2011, 1, 1, tzinfo=pytz.UTC),
                datetime(2011, 2, 1, tzinfo=pytz.UTC),
                datetime(2011, 3, 2, tzinfo=pytz.UTC),
                datetime(2011, 4, 3, tzinfo=pytz.UTC),
                datetime(2011, 4, 29, tzinfo=pytz.UTC),
            ],
            columns=["value", "estimated"]
        ),
        unit="KWH")
>>> trace_data, temp_data = formatter.create_input(energy_trace, weather_source)
>>> trace_data
2011-01-01 00:00:00+00:00    1.0
2011-02-01 00:00:00+00:00    1.0
2011-03-02 00:00:00+00:00    2.0
2011-04-29 00:00:00+00:00    NaN
dtype: float64
>>> temp_data
period                    hourly
2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00  32.0
                          2011-01-01 01:00:00+00:00  32.0
                          2011-01-01 02:00:00+00:00  32.0
...                                                   ...
2011-03-02 00:00:00+00:00 2011-04-28 21:00:00+00:00  32.0
                          2011-04-28 22:00:00+00:00  32.0
                          2011-04-28 23:00:00+00:00  32.0
>>> index = pd.date_range('2013-01-01', periods=365, freq='D')
>>> formatter.create_input(index, weather_source)
                           tempF
2013-01-01 00:00:00+00:00   28.3
2013-01-02 00:00:00+00:00   31.0
2013-01-03 00:00:00+00:00   34.1
                            ...
2013-12-29 00:00:00+00:00   12.3
2013-12-30 00:00:00+00:00   26.0
2013-12-31 00:00:00+00:00   24.1
create_demand_fixture(index, weather_source)[source]

Creates a DatetimeIndex ed dataframe containing formatted demand fixture data.

Parameters:
  • index (pandas.DatetimeIndex) – The desired index for demand fixture data.
  • weather_source (eemeter.weather.WeatherSourceBase) – The source of weather fixture data.
Returns:

input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.predict() methods.

Return type:

pandas.DataFrame

create_input(trace, weather_source)[source]

Creates two DatetimeIndex ed dataframes containing formatted model input data formatted as follows.

Parameters:
  • trace (eemeter.structures.EnergyTrace) – The source of energy data for inclusion in model input.
  • weather_source (eemeter.weather.WeatherSourceBase) – The source of weather data.
Returns:

  • trace_data (pandas.DataFrame) – Predictably formatted trace data with estimated data removed. This data should be directly usable as input to applicable model.fit() methods.

  • temperature_data (pandas.DataFrame) – Predictably formatted temperature data with a pandas MultiIndex. The MultiIndex contains two levels - ‘period’, which corresponds directly to the trace_data index, and ‘hourly’ or ‘daily’, which contains, respectively, hourly or daily temperature data. This is intended for use like the following:

    >>> temperature_data.groupby(level='period')
    

    This data should be directly usable as input to applicable model.fit() methods.

daily_trace_data(trace)[source]

Transforms a trace for this formatter to a daily series

get_input_data_mask(input_data)[source]

Boolean list of missing/not missing values: True => missing False => not missing

hourly_trace_data(trace)[source]

Transforms a trace for this formatter to a hourly series

eemeter.modeling.models
class eemeter.modeling.models.seasonal.SeasonalElasticNetCVModel(cooling_base_temp=65, heating_base_temp=65, n_bootstrap=100, modeling_period_interpretation='baseline')[source]

Linear regression using daily frequency data to build a model of formatted energy trace data that takes into account HDD, CDD, day of week, month, and holiday effects, with elastic net regularization.

Parameters:
  • cooling_base_temp (float) – Base temperature (degrees F) used in calculating cooling degree days.
  • heating_base_temp (float) – Base temperature (degrees F) used in calculating heating degree days.
  • n_bootstrap (int) – Number of points to exclude during bootstrap error estimation.
class eemeter.modeling.models.billing.BillingElasticNetCVModel(cooling_base_temp=65, heating_base_temp=65, n_bootstrap=100, modeling_period_interpretation='baseline')[source]

Linear regression of energy values against CDD/HDD with elastic net regularization.

Parameters:
  • cooling_base_temp (float) – Base temperature (degrees F) used in calculating cooling degree days.
  • heating_base_temp (float) – Base temperature (degrees F) used in calculating heating degree days.
  • n_bootstrap (int) – Number of points to exclude during bootstrap error estimation.
class eemeter.modeling.models.caltrack.CaltrackMonthlyModel(fit_cdd=True, grid_search=False, min_contiguous_baseline_months=12, min_contiguous_reporting_months=12, modeling_period_interpretation='baseline', weighted=False, **kwargs)[source]

This class implements the two-stage modeling routine agreed upon as part of the Caltrack beta test.

If fit_cdd is True, then all four candidate models (HDD+CDD, CDD-only, HDD-only, and Intercept-only) are used in stage 1 estimation. If it’s false, then only HDD-only and Intercept-only are used.

If grid_search is set to True, the balance point temperatures are determined by maximizing R^2 across the range 50-85 degF. Otherwise, 70 and 60 degF are used for cooling and heating, respectively.

Min_contiguous_months sets the number of contiguous months of data required at the beginning of the reporting period/end of the baseline period in order for the weather normalization to be valid.

billing_to_monthly_avg(trace_and_temp)[source]

Helper function to handle monthly billing or other irregular data.

daily_to_monthly_avg(df)[source]

Convert from daily usage and temperature to monthly usage per day and average HDD/CDD.

predict(demand_fixture_data, params=None, summed=True)[source]

Predicts across index using fitted model params

Parameters:
  • demand_fixture_data (pandas.DataFrame) – Formatted input data as returned by CaltrackFormatter.create_demand_fixture()
  • params (dict, default None) –

    Parameters found during model fit. If None, .fit() must be called before this method can be used.

    • X_design_matrix: patsy design matrix used in formatting design matrix.
    • formula: patsy formula used in creating design matrix.
    • coefficients: ElasticNetCV coefficients.
    • intercept: ElasticNetCV intercept.
Returns:

output – Dataframe of energy values as given by the fitted model across the index given in demand_fixture_data.

Return type:

pandas.DataFrame

eemeter.processors
eemeter.processors.dispatchers
eemeter.processors.dispatchers.get_energy_modeling_dispatches(modeling_period_set, trace_set)[source]

Dispatches a set of applicable models and formatters for each pairing of modeling period sets and trace sets given.

Parameters:
eemeter.processors.interventions
eemeter.processors.interventions.get_modeling_period_set(interventions)[source]

Creates an applicable modeling period set given a list of interventions.

Parameters:interventions (list of eemeter.structures.Intervention) – Interventions for which to build ModelingPeriodSet.
eemeter.processors.location
eemeter.processors.location.get_weather_normal_source(site, use_cz2010=False)[source]

Finds most relevant WeatherSource given project site.

Parameters:
  • site (eemeter.structures.ZIPCodeSite) – Site to match to weather source data.
  • use_cz2010 (boolean, default False) – Indicates whether or not to use CZ2010 mapping.
Returns:

weather_normal_source – Closest data-validated TMY3 weather normal source in the same climate zone as project ZIP code, if available. If use_cz2010 is True, returns the corresponding CZ2010WeatherSource. If no station can be found, returns None.

Return type:

eemeter.weather.TMY3WeatherSource or eemeter.weather.CZ2010WeatherSource or None

eemeter.processors.location.get_weather_source(site, use_cz2010=False)[source]

Finds most relevant WeatherSource given project site.

Parameters:
  • site (eemeter.structures.ZIPCodeSite) – Site to match to weather source data.
  • use_cz2010 (boolean, default False) – Indicates whether or not to use CZ2010 mapping.
Returns:

weather_source – Closest data-validated weather source in the same climate zone as project ZIP code, if available. If use_cz2010 is set, returns the ISDWeatherSource corresponding with the cz2010 station mapping. If no station can be found, returns None.

Return type:

eemeter.weather.ISDWeatherSource or None

eemeter.structures
class eemeter.structures.EnergyTrace(interpretation, data=None, records=None, unit=None, placeholder=False, serializer=None, trace_id=None, interval=None)[source]

Container for time series energy data.

Parameters:
  • interpretation (str) –

    The way this energy time series in the data attribute should be interpreted. The complete list of supported options is as follows:

    • ELECTRICITY_CONSUMPTION_SUPPLIED: Represents the amount of utility-supplied electrical energy consumed on-site, as metered at a single usage point, such as a utility-owned electricity meter. Specifically does not include consumption of electricity generated on site, such as by locally installed solar photovoltaic panels.
    • ELECTRICITY_CONSUMPTION_TOTAL: Represents the amount of electrical energy consumed on-site, including both utility-supplied and on-site generated electrical energy. Equivalent, for a single electricity meter, to ELECTRICITY_CONSUMPTION_SUPPLIED - ELECTRICITY_ON_SITE_GENERATION_CONSUMED.
    • ELECTRICITY_CONSUMPTION_NET: Represents the amount of utility-supplied electrical energy consumed on-site minus the amount of unconsumed electrical energy generated on site and fed back into the grid at a single usage point, such as a utility-owned electricity meter. Equivalent, for a single electricity meter, to ELECTRICITY_CONSUMPTION_SUPPLIED - ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED.
    • ELECTRICITY_ON_SITE_GENERATION_TOTAL: Represents the amount of locally generated electrical energy consumed on-site plus the amount of locally generated elecrical energy returned to the grid, as metered at a single usage point. Equivalent, for a single electricity meter, to ELECTRICITY_ON_SITE_GENERATION_CONSUMED + ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED.
    • ELECTRICITY_ON_SITE_GENERATION_CONSUMED: Represents the amount of locally generated electrical energy consumed on-site, such as energy generated by solar photovoltaic panels.
    • ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED: Represents the amount of excess locally generated energy, which instead of being consumed on-site, is fed back into the grid or sold back a utility.
    • NATURAL_GAS_CONSUMPTION_SUPPLIED: Represents the amount of energy supplied by a utility in the form of natural gas and used on site, as metered at a single usage point. Though under the labeling scheme used for electricity interpretetations the labels NATURAL_GAS_CONSUMPTION_TOTAL and NATURAL_GAS_CONSUMPTION_NET would be equivalent for natural gas, NATURAL_GAS_CONSUMPTION_SUPPLIED is prefered for its greater specificity.
  • data (pandas.DataFrame, default None) –

    A pandas DataFrame with two columns and a timezone-aware DatetimeIndex. Timestamps in the index are assumed to refer to the start of each period, and the period ends are assumed to coincide with the start of the following period. Thus, the value of the last datetime should always be NaN, since is purpose is only to cap the end of the last period, and not to represent a time period over which energy was consumed. The DatetimeIndex does not need to have uniform frequency, such as those specified in pandas using the freq attribute.

    • value: Amount of energy between this index and the next.
    • estimated: Whether or not the value was estimated. Particularly relevant for monthly billing data.

    If serializer instance is provided, this should instead be records in the format expected by the serializer.

  • unit (str) –

    The name of the unit in which the energy time series is given. These names are normalized to either 'KWH' or 'THERM' as follows:

    • 'kwh' becomes 'KWH' with no unit conversion multiplier.
    • 'kWh' becomes 'KWH' with no unit conversion multiplier.
    • 'KWH' becomes 'KWH' with no unit conversion multiplier.
    • 'therm' becomes 'THERM' with no unit conversion multiplier.
    • 'therms' becomes 'THERM' with no unit conversion multiplier.
    • 'thm' becomes 'THERM' with no unit conversion multiplier.
    • 'THERM' becomes 'THERM' with no unit conversion multiplier.
    • 'THERMS' becomes 'THERM' with no unit conversion multiplier.
    • 'THM' becomes 'THERM' with no unit conversion multiplier.
    • 'wh' becomes 'KWH' with a unit conversion multiplier of 0.001.
    • 'Wh' becomes 'KWH' with a unit conversion multiplier of 0.001.
    • 'WH' becomes 'KWH' with a unit conversion multiplier of 0.001.
  • placeholder (bool) – Indicates that this instance is a placeholder - that while for some reason the data associated with it is unavailable, its existence is still important in considering a whole site.
  • serializer (consumption.BaseSerializer) – Serializer instance to be used to deserialize records into a pandas dataframe. Must supply the to_dataframe(records) method.
class eemeter.structures.EnergyTraceSet(traces, labels=None)[source]

A container for energy traces which ensures that each is labeled.

Parameters:
  • traces (list or dict of eemeter.structures.EnergyTrace objects) – EnergyTrace objects to be included in this list.
  • labels (list of str) – Unique labels for traces, used only if traces is not a dictionary.
itertraces()[source]

Iterates over traces, yielding (label, trace) pairs.

class eemeter.structures.Intervention(start_date, end_date=None)[source]

Represents an intervention with a start date, and maybe an end date. Multiple interventions can be composed within a project.

Parameters:
  • start_date (datetime.datetime) – Must be timezone aware
  • end_date (datetime.datetime or None, default None) – Must be timezone aware. If None, intervention is assumed to be ongoing.
class eemeter.structures.ModelingPeriod(interpretation, start_date=None, end_date=None)[source]

Represents a period of time over which to select data from a Trace for contiguous modeling. Carries an “interpretation”, for which there are two options, “BASELINE” and “REPORTING”. The period is defined by a single optional start date and a single optional end date. If the start date is not given, the start date is considered to be negative infinity; if the end date is not given, the end date is considered to be positive infinity.

A ModelingPeriod is a time period, defined by start and end dates, over which the process behind a trace can be expected, for modeling purposes, to have roughly the same energy response to end use demand. Note that this criterion might not be particularly well specified without reference to a particular intervention and set of modeling conditions.

Parameters:
  • interpretation (str, {"BASELINE", "REPORTING"}) –

    The way this ModelingPeriod should be interpreted.

    • ”BASELINE” means that this modeling period represents the time before an intervention or set of interventions.
    • ”REPORTING” means that this modeling period represents the time after an intervention or set of interventions.
  • start_date (datetime.datetime or None) – The date marking the earliest date of the ModelingPeriod. None indicates a start_date of negative infinity. If interpretation is “REPORTING”, start_date cannot be None.
  • end_date (datetime.datetime or None) – The date marking the latest date of the ModelingPeriod. None indicates an end_date of positive infinity. If interpretation is “BASELINE”, end_date cannot be None.
class eemeter.structures.ModelingPeriodSet(modeling_periods, groupings)[source]

Represents a set of labeled modeling periods of interest, grouped into meaningful comparison sets. Labels can be arbitrary.

Basic usage:

>>> modeling_periods = {
...     "modeling_period_1": ModelingPeriod(
...         "BASELINE",
...         end_date=datetime(2000, 1, 1, tzinfo=pytz.UTC),
...     ),
...     "modeling_period_2": ModelingPeriod(
...         "REPORTING",
...         start_date=datetime(2000, 2, 1, tzinfo=pytz.UTC),
...     ),
...     "modeling_period_3": ModelingPeriod(
...         "REPORTING",
...         start_date=datetime(2000, 2, 1, tzinfo=pytz.UTC),
...     ),
... }
...
>>> grouping = [
...     ("modeling_period_1", "modeling_period_2"),
...     ("modeling_period_1", "modeling_period_3"),
... ]
...
>>> mps = ModelingPeriodSet(modeling_periods, grouping)
class eemeter.structures.Project(energy_trace_set, interventions, site, project_id=None)[source]

Container for storing project data.

Parameters:
  • trace_set (eemeter.structures.TraceSet) – Complete set of energy traces for this project. For a project site that has, for example, two electricity meters, each with two traces (supplied electricity kWh, and solar-generated kWh) and one natural gas meter with one trace (consumed natural gas therms), the trace_set should contain 5 traces, regardless of the availablity of that data. Traces which are unavailable should be represented as ‘placeholder’ traces.
  • interventions (list of eemeter.structures.Intervention) – Complete set of interventions, planned, ongoing, or completed, that have taken or will take place at this site as part of this project.
  • site (eemeter.structures.Site) – The site of this project.
class eemeter.structures.ZIPCodeSite(zipcode)[source]

ZIP-code-based site location descriptor.

Parameters:zipcode (str) – A five-digit zipcode identifier.
eemeter.weather
GSODWeatherSource
class eemeter.weather.GSODWeatherSource(station, cache_url=None)[source]

The GSODWeatherSource draws weather data from the NOAA Global Summary of the Day FTP site. It stores fetched data locally by default in a SQLite database at ~/eemeter/cache/weather_cache.db, unless you use set the EEMETER_WEATHER_CACHE_URL environment variable to another, SQLAlchemy compatible database URL:

Basic usage is as follows:

>>> from eemeter.weather import GSODWeatherSource
>>> ws = GSODWeatherSource("722880")  # or another 6-digit USAF station

This object can be used to fetch weather data as follows, using an daily frequency time-zone aware pandas DatetimeIndex covering any stretch of time.

>>> import pandas as pd
>>> import pytz
>>> index = pd.date_range('2015-01-01', periods=365,
...     freq='D', tz=pytz.UTC)
>>> ws.indexed_temperatures(index, "degF")
2015-01-01 00:00:00+00:00    43.6
2015-01-02 00:00:00+00:00    45.0
2015-01-03 00:00:00+00:00    47.3
                             ...
2015-12-29 00:00:00+00:00    48.0
2015-12-30 00:00:00+00:00    46.4
2015-12-31 00:00:00+00:00    47.6
Freq: D, dtype: float64
add_year(year, force_fetch=False)

Adds temperature data to internal pandas timeseries

Note

This method is called automatically internally to keep data updated in response to calls to .indexed_temperatures()

Parameters:
  • year ({int, string}) – The year for which data should be fetched, e.g. “2010”.
  • force_fetch (bool, default=False) – If True, forces the fetch; if False, checks to see if locally available before actually fetching.
add_year_range(start_year, end_year, force_fetch=False)

Adds temperature data to internal pandas timeseries across a range of years.

Note

This method is called automatically internally to keep data updated in response to calls to .indexed_temperatures()

Parameters:
  • start_year ({int, string}) – The earliest year for which data should be fetched, e.g. “2010”.
  • end_year ({int, string}) – The latest year for which data should be fetched, e.g. “2013”.
  • force_fetch (bool, default=False) – If True, forces the fetch; if false, checks to see if year has been added before actually fetching.
indexed_temperatures(index, unit, allow_mixed_frequency=False)

Return average temperatures over the given index.

Parameters:
  • index (pandas.DatetimeIndex) – Index over which to supply average temperatures. The index should be given as either an hourly (‘H’) or daily (‘D’) frequency.
  • unit (str, {"degF", "degC"}) – Target temperature unit for returned temperature series.
Returns:

temperatures – Average temperatures over series indexed by index.

Return type:

pandas.Series with DatetimeIndex

ISDWeatherSource
class eemeter.weather.ISDWeatherSource(station, cache_url=None)[source]

The ISDWeatherSource draws weather data from the NOAA Integrated Surface Database (ISD) FTP site. It stores fetched hourly data locally by default in a SQLite database at ~/eemeter/cache/weather_cache.db, unless you use set the following environment variable to something different:

$ export EEMETER_WEATHER_CACHE_DIRECTORY=/path/to/custom/directory

Basic usage is as follows:

>>> from eemeter.weather import ISDWeatherSource
>>> ws = ISDWeatherSource("722880")  # or another 6-digit USAF station

This object can be used to fetch weather data as follows, using an hourly or daily frequency time-zone aware pandas DatetimeIndex covering any stretch of time.

>>> import pandas as pd
>>> import pytz
>>> daily_index = pd.date_range('2015-01-01', periods=365,
...     freq='D', tz=pytz.UTC)
>>> ws.indexed_temperatures(daily_index, "degF")
2015-01-01 00:00:00+00:00    43.550000
2015-01-02 00:00:00+00:00    45.042500
2015-01-03 00:00:00+00:00    47.307500
                               ...
2015-12-29 00:00:00+00:00    47.982500
2015-12-30 00:00:00+00:00    46.415000
2015-12-31 00:00:00+00:00    47.645000
Freq: D, dtype: float64
>>> hourly_index = pd.date_range('2015-01-01', periods=365*24,
...     freq='H', tz=pytz.UTC)
>>> ws.indexed_temperatures(hourly_index, "degF")
2015-01-01 00:00:00+00:00    51.98
2015-01-01 01:00:00+00:00    50.00
2015-01-01 02:00:00+00:00    48.02
                             ...
2015-12-31 21:00:00+00:00    62.06
2015-12-31 22:00:00+00:00    62.06
2015-12-31 23:00:00+00:00    62.06
Freq: H, dtype: float64
add_year(year, force_fetch=False)

Adds temperature data to internal pandas timeseries

Note

This method is called automatically internally to keep data updated in response to calls to .indexed_temperatures()

Parameters:
  • year ({int, string}) – The year for which data should be fetched, e.g. “2010”.
  • force_fetch (bool, default=False) – If True, forces the fetch; if False, checks to see if locally available before actually fetching.
add_year_range(start_year, end_year, force_fetch=False)

Adds temperature data to internal pandas timeseries across a range of years.

Note

This method is called automatically internally to keep data updated in response to calls to .indexed_temperatures()

Parameters:
  • start_year ({int, string}) – The earliest year for which data should be fetched, e.g. “2010”.
  • end_year ({int, string}) – The latest year for which data should be fetched, e.g. “2013”.
  • force_fetch (bool, default=False) – If True, forces the fetch; if false, checks to see if year has been added before actually fetching.
indexed_temperatures(index, unit, allow_mixed_frequency=False)

Return average temperatures over the given index.

Parameters:
  • index (pandas.DatetimeIndex) – Index over which to supply average temperatures. The index should be given as either an hourly (‘H’) or daily (‘D’) frequency.
  • unit (str, {"degF", "degC"}) – Target temperature unit for returned temperature series.
Returns:

temperatures – Average temperatures over series indexed by index.

Return type:

pandas.Series with DatetimeIndex

TMY3WeatherSource
class eemeter.weather.TMY3WeatherSource(station, cache_url=None, preload=True)[source]

The TMY3WeatherSource draws weather data from the NREL’s Typical Meteorological Year 3 database. It stores fetched data locally by default in a SQLite database at ~/.eemeter/cache/weather_cache.db, unless you use set the EEMETER_WEATHER_CACHE_URL environment variable to another, SQLAlchemy compatible database URL:

Basic usage is as follows:

>>> from eemeter.weather import TMY3WeatherSource
>>> ws = TMY3WeatherSource("724830")  # or another 6-digit USAF station

This object can be used to fetch weather data as follows, using an daily frequency time-zone aware pandas DatetimeIndex covering any stretch of time.

>>> import pandas as pd
>>> import pytz
>>> daily_index = pd.date_range('2015-01-01', periods=365,
...     freq='D', tz=pytz.UTC)
>>> ws.indexed_temperatures(daily_index, "degF")
2015-01-01 00:00:00+00:00    38.6450
2015-01-02 00:00:00+00:00    40.4900
2015-01-03 00:00:00+00:00    43.9175
                              ...
2015-12-29 00:00:00+00:00    43.7750
2015-12-30 00:00:00+00:00    43.6250
2015-12-31 00:00:00+00:00    46.9250
Freq: D, dtype: float64
>>> hourly_index = pd.date_range('2015-01-01', periods=365*24,
...     freq='H', tz=pytz.UTC)
>>> ws.indexed_temperatures(hourly_index, "degF")
2015-01-01 00:00:00+00:00    51.80
2015-01-01 01:00:00+00:00    50.00
2015-01-01 02:00:00+00:00    50.00
                             ...
2015-12-31 21:00:00+00:00    53.60
2015-12-31 22:00:00+00:00    55.40
2015-12-31 23:00:00+00:00    55.40
Freq: H, dtype: float64
indexed_temperatures(index, unit)

Return average temperatures over the given index.

Parameters:
  • index (pandas.DatetimeIndex) – Index over which to supply average temperatures. The index should be given as either an hourly (‘H’) or daily (‘D’) frequency.
  • unit (str, {"degF", "degC"}) – Target temperature unit for returned temperature series.
Returns:

temperatures – Average temperatures over series indexed by index.

Return type:

pandas.Series with DatetimeIndex

Location
eemeter.weather.location.climate_zone_is_supported(climate_zone)[source]

True if given Climate Zone is supported.

Parameters:climate_zone (str) – String representing a climate_zone.
Returns:supportedTrue if supported, otherwise False.
Return type:bool
eemeter.weather.location.climate_zone_to_tmy3_stations(climate_zone)[source]

Return TMY3 weather stations falling within in the given climate zone.

Parameters:climate_zone (str) – String representing a climate zone.
Returns:stations – Strings representing TMY3 station ids.
Return type:list of str
eemeter.weather.location.climate_zone_to_usaf_stations(climate_zone)[source]

Return USAF weather stations falling within in the given climate zone.

Parameters:climate_zone (str) – String representing a climate zone.
Returns:stations – Strings representing USAF station ids.
Return type:list of str
eemeter.weather.location.climate_zone_to_zipcodes(climate_zone)[source]

Return ZIP codes with centroids in the given climate zone.

Parameters:climate_zone (str) – String representing a climate zone.
Returns:zipcodes – Strings representing USPS ZIP codes.
Return type:list of str
eemeter.weather.location.cz2010_station_is_supported(station)[source]

True if given CZ2010 weather station is supported. USAF IDs.

Parameters:station (str) – 6-digit string representing a weather station.
Returns:supportedTrue if supported, otherwise False.
Return type:bool
eemeter.weather.location.haversine(lat1, lng1, lat2, lng2)[source]

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

Parameters:
  • lat1 (float) – Latitude coordinate of first point.
  • lng1 (float) – Longitude coordinate of first point.
  • lat2 (float) – Latitude coordinate of second point.
  • lng2 (float) – Longitude coordinate of second point.
Returns:

distance – Kilometers between the two lat/lng coordinates.

Return type:

float

eemeter.weather.location.lat_lng_to_climate_zone(lat, lng)[source]

Return the closest ZIP code using latitude and longitude coordinates.

Parameters:
  • lat (float) – Latitude coordinate.
  • lng (float) – Longitude coordinate.
Returns:

climate_zone – String representing a climate zone.

Return type:

str, None

eemeter.weather.location.lat_lng_to_tmy3_station(lat, lng)[source]

Return the closest TMY3 station ID using latitude and longitude coordinates.

Parameters:
  • lat (float) – Latitude coordinate.
  • lng (float) – Longitude coordinate.
Returns:

station – String representing a TMY3 weather station ID or None, if none was found.

Return type:

str, None

eemeter.weather.location.lat_lng_to_usaf_station(lat, lng)[source]

Return the closest USAF station ID using latitude and longitude coordinates.

Parameters:
  • lat (float) – Latitude coordinate.
  • lng (float) – Longitude coordinate.
Returns:

station – String representing a USAF weather station ID or None, if none was found.

Return type:

str, None

eemeter.weather.location.lat_lng_to_zipcode(lat, lng)[source]

Return the closest ZIP code using latitude and longitude coordinates.

Parameters:
  • lat (float) – Latitude coordinate.
  • lng (float) – Longitude coordinate.
Returns:

zipcode – String representing a USPS ZIP code, or None, if none was found.

Return type:

str, None

eemeter.weather.location.tmy3_station_is_supported(station)[source]

True if given TMY3 weather station is supported. USAF IDs.

Parameters:station (str) – 6-digit string representing a weather station.
Returns:supportedTrue if supported, otherwise False.
Return type:bool
eemeter.weather.location.tmy3_station_to_climate_zone(station)[source]

Return the climate zone of the station.

Parameters:station (str) – String representing a USAF Weather station ID
Returns:climate_zone – String representing a climate zone.
Return type:str
eemeter.weather.location.tmy3_station_to_lat_lng(station)[source]

Return the latitude and longitude coordinates of the given station.

Parameters:station (str) – String representing a TMY3 USAF Weather station ID
Returns:lat_lng – Latitude and longitude coordinates.
Return type:tuple of float
eemeter.weather.location.tmy3_station_to_zipcodes(station)[source]

Return the zipcodes that map to this station.

Parameters:station (str) – String representing a USAF Weather station ID
Returns:zipcode – String representing a USPS ZIP code.
Return type:list of str
eemeter.weather.location.usaf_station_is_supported(station)[source]

True if given USAF weather station is supported. USAF IDs.

Parameters:station (str) – 6-digit string representing a weather station.
Returns:supportedTrue if supported, otherwise False.
Return type:bool
eemeter.weather.location.usaf_station_to_climate_zone(station)[source]

Return the climate zone of the station.

Parameters:station (str) – String representing a USAF Weather station ID
Returns:climate_zone – String representing a climate zone
Return type:str
eemeter.weather.location.usaf_station_to_lat_lng(station)[source]

Return the latitude and longitude coordinates of the given USAF station.

Parameters:station (str) – String representing a USAF Weather station ID
Returns:lat_lng – Latitude and longitude coordinates.
Return type:tuple of float
eemeter.weather.location.usaf_station_to_zipcodes(station)[source]

Return the zipcodes that map to this USAF station.

Parameters:station (str) – String representing a USAF Weather station ID
Returns:zipcodes – Strings representing a USPS ZIP code mapped to from this station.
Return type:list of str
eemeter.weather.location.zipcode_is_supported(zipcode)[source]

True if given ZIP Code is supported. ZCTA only.

Parameters:zipcode (str) – 5-digit string representing a zipcode.
Returns:supportedTrue if supported, otherwise False.
Return type:bool
eemeter.weather.location.zipcode_to_climate_zone(zipcode)[source]

Return the climate zone of the ZIP code (by latitude and longitude centroid of ZIP code).

Parameters:zipcode (str) – String representing a USPS ZIP code.
Returns:climate_zone – String representing a climate zone
Return type:str
eemeter.weather.location.zipcode_to_cz2010_station(zipcode)[source]

Return the nearest CZ2010 station (by latitude and longitude centroid) of the ZIP code.

Parameters:
  • zipcode (str) – String representing a USPS ZIP code.
  • use_cz2010 (boolean, default False) – Use the CZ2010 zipcode to weather station mapping.
Returns:

station – String representing a CZ2010 weather station ID

Return type:

str

eemeter.weather.location.zipcode_to_lat_lng(zipcode)[source]

Return the latitude and longitude centroid of a particular ZIP code.

Parameters:zipcode (str) – String representing a USPS ZIP code.
Returns:lat_lng – Latitude and longitude coordinates.
Return type:tuple of float
eemeter.weather.location.zipcode_to_tmy3_station(zipcode)[source]

Return the nearest TMY3 station (by latitude and longitude centroid) of the ZIP code.

Parameters:zipcode (str) – String representing a USPS ZIP code.
Returns:station – String representing a TMY3 Weather station (USAF ID).
Return type:str
eemeter.weather.location.zipcode_to_usaf_station(zipcode)[source]

Return the nearest USAF station (by latitude and longitude centroid) of the ZIP code.

Parameters:zipcode (str) – String representing a USPS ZIP code.
Returns:station – String representing a USAF weather station ID
Return type:str

Development

Testing

This library uses the py.test framework. To develop locally, clone the repo, and in a virtual environment execute the following commands:

$ git clone https://github.com/openeemeter/eemeter
$ cd eemeter
$ mkvirtualenv eemeter
$ pip install -r dev_requirements.txt
$ pip install -e .
$ tox
Building Documentation

Documentation is built using the sphinx package. To build documentation, make sure that dev requirements are installed:

$ pip install -r dev_requirements.txt

You will also need to [install pandoc](http://pandoc.org/installing.html) to build docs locally.

And run the following from the root project directory.

$ make -C docs html

To clean the build directory, run the following:

$ make -C docs clean

datastore

The datastore is an application for housing energy and project data which provides a REST API for loading data, computing energy savings, and inspecting results. Like the eemeter library, the datastore is open source and available on github under an MIT license.

The datastore uses the django web framework with a PostgreSQL database.

Development Setup

Clone the repo and change directories
git clone git@github.com:openeemeter/datastore.git
cd datastore
Install required python packages

We recommend using virtualenv (or virtualenvwrapper) to manage python packages

mkvirtualenv datastore
pip install -r requirements.txt
pip install -r dev-requirements.txt
Define the necessary environment variables
# django
export DJANGO_SETTINGS_MODULE=oeem_energy_datastore.settings
export SECRET_KEY=<django-secret-key>  # random string

# postgres
export DATABASE_URL=postgres://user:password@host:5432/dbname

# for API docs - should reflect the IP or DNS name where datastore will be deployed
export SERVER_NAME=0.0.0.0:8000
export PROTOCOL=http  # or https

# For development only
export DEBUG=true

# For celery background tasks
export CELERY_ALWAYS_EAGER=true

  or

export BROKER_TRANSPORT=redis
export BROKER_URL=redis://user:password@host:9549

If developing on the datastore, you might consider adding these to your virtualenv postactivate script:

vim /path/to/virtualenvs/datastore/bin/postactivate

# Refresh environment
workon datastore
Run database migrations
python manage.py migrate
Seed the database
python manage.py dev_seed
Start a development server
python manage.py runserver

Topics

Basic Usage: datastore application

The datastore is a tool for using the eemeter which automates and helps to scale some of the most frequent tasks accomplished by the eemeter. These tasks include data loading and storage, meter running, result storage and warehousing. It puts a REST API in front of the eemeter and uses a postgres backend.

This tutorial is also available as a jupyter notebook.

Note:

This tutorial assumes you have a working datastore instance. If you do not, please follow the datastore development setup instructions or contact Open EE to setting up a dedicated production deployment.

Note:

For small and large datasets, the ETL toolkit exists to ease and speed up the process of loading your data.

This tutorial does not cover ETL toolkit usage. For more information on the ETL toolkit, see its API documentation.

Setup
In [1]:
# library imports
import pandas as pd
import requests
import pytz

If you followed the datastore development setup instructions, you will already have run the command to create a superuser and access credentials.

python manage.py dev_seed

If you haven’t already done so, do so now. The dev_seed command creates a demo admin user and a sample project.

  • username: demo,
  • password: demo-password,
  • API access token: tokstr.

Ensure that your development server is running locally on port 8000 before continuing.

python manage.py runserver

Each request will include an Authorization header

Authorization: Bearer tokstr
In [2]:
base_url = "http://0.0.0.0:8000"
token = "tokstr"
headers = {"Authorization": "Bearer {}".format(token)}
Using the API to get loaded data

We can use the API to inspect the data that is loaded into the datastore. (The API can also be used for loading data, but that is not covered here. See the ETL tutorial for more information on loading data.)

Note:

We will use the requests python package for making requests, but you could just as easily use a tool like cURL or Postman.

If you have the eemeter package installed, you will also have the requests package installed, but if not, you can install it with:

$ pip install requests

A request using the requests library looks like this:

import requests
url = "https://example.com"
data = {
    "first_name": "John",
    "last_name": "Doe"
}
requests.post(url + "/api/users/", json=data)

which is equivalent to:

POST /api/users/ HTTP/1.1
Host: example.com
{
    "first_name": "John",
    "last_name": "Doe"
}

Since the dev_seed command creates a sample project, this will return a response showing that project. Projects all have a unique “project_id”, which can be set to whatever is most appropriate (note: it is not used as primary key; that’s the ‘id’ field).

In [3]:
url = base_url + "/api/v1/projects/"
projects = requests.get(url, headers=headers).json()
In [4]:
projects
Out[4]:
[{'baseline_period_end': '2012-01-01T00:00:00Z',
  'baseline_period_start': None,
  'id': 1,
  'project_id': 'DEV_SEED_PROJECT',
  'project_owner_id': 1,
  'reporting_period_end': None,
  'reporting_period_start': '2012-02-01T00:00:00Z',
  'zipcode': '91104'}]

Energy trace data will be associated with this project by foreign key through a many-to-many table. This means that projects can have 0 to n associated traces, and that traces can have 0 to n associated projects.

Like projects and the project_id field, traces are identified by a unique ‘trace_id’ field, which can also be set to whatever is most appropriate.

There are API endpoints used to fetch trace data:

  1. /api/v1/traces/: This stores trace ids, unit, and interpretation.
  2. /api/v1/trace_records/: This stores time-series records associated with each trace.

These records are stored by record start timestamp, with the implicit assumption that the start timestamp of the next temporal record is the end of the current record. The value of the last record is ignored, and serves as the final end timestamp (and is usually set to null).

In [5]:
url = base_url + "/api/v1/traces/?projects={}".format(projects[0]['id'])
traces = requests.get(url, headers=headers).json()
In [6]:
traces
Out[6]:
[{'id': 1,
  'interpretation': 'NATURAL_GAS_CONSUMPTION_SUPPLIED',
  'trace_id': 'DEV_SEED_TRACE_NATURAL_GAS_MONTHLY',
  'unit': 'THERM'},
 {'id': 2,
  'interpretation': 'NATURAL_GAS_CONSUMPTION_SUPPLIED',
  'trace_id': 'DEV_SEED_TRACE_NATURAL_GAS_DAILY',
  'unit': 'THERM'},
 {'id': 3,
  'interpretation': 'ELECTRICITY_CONSUMPTION_SUPPLIED',
  'trace_id': 'DEV_SEED_TRACE_ELECTRICITY_15MIN',
  'unit': 'KWH'},
 {'id': 4,
  'interpretation': 'ELECTRICITY_CONSUMPTION_SUPPLIED',
  'trace_id': 'DEV_SEED_TRACE_ELECTRICITY_HOURLY',
  'unit': 'KWH'},
 {'id': 5,
  'interpretation': 'ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED',
  'trace_id': 'DEV_SEED_TRACE_SOLAR_HOURLY',
  'unit': 'KWH'},
 {'id': 6,
  'interpretation': 'ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED',
  'trace_id': 'DEV_SEED_TRACE_SOLAR_30MIN',
  'unit': 'KWH'}]

We can also query for trace records by trace primary key.

In [7]:
url = base_url + "/api/v1/trace_records/?trace={}".format(traces[0]['id'])
trace_records = requests.get(url, headers=headers).json()
In [8]:
trace_records[:3]  # first 3 records
Out[8]:
[{'estimated': False,
  'id': 1,
  'start': '2010-01-01T00:00:00Z',
  'trace_id': 1,
  'value': None},
 {'estimated': False,
  'id': 2,
  'start': '2010-02-01T00:00:00Z',
  'trace_id': 1,
  'value': 1.0},
 {'estimated': False,
  'id': 3,
  'start': '2010-03-01T00:00:00Z',
  'trace_id': 1,
  'value': 1.0}]
Running meters

Running a meter means pulling trace data, matching it with relevant project data, and evaluating its energy effiency performance. This is the central task performed by the datastore, so if the specifics are unfamiliar, there is a bit more background information worthy of review in the Methods Overview section of the guides.

To run a meter, make a request to create a “meter run”. This request will start a job that runs a meter and saves its results. The result of a meter run is called a “meter result”.

In [9]:
from collections import OrderedDict
import json
Scheduling a single meter run

The primary component of this request is a trace primary key.

The project data associated with the trace will be automatically pulled in to be associated with the trace.

In [10]:
created_meter_run = requests.post(
    base_url + "/api/v1/meter_runs/",
    json={
        "trace": traces[0]['id']  # single trace primary key
    },
    headers=headers
).json(object_pairs_hook=OrderedDict)  # retains order of keys
In [11]:
print(json.dumps(created_meter_run, indent=2))
{
  "id": 1,
  "trace": 1,
  "project": 1,
  "meter_result": 1,
  "meter_input": null,
  "status": "PENDING",
  "failure_message": null,
  "traceback": null,
  "model_class": null,
  "model_kwargs": null,
  "formatter_class": null,
  "formatter_kwargs": null,
  "added": "2016-11-18T02:16:36.078334Z",
  "updated": "2016-11-18T02:16:36.078375Z"
}

This is a summary of the task to run the meter on the indicated project.

The response shows us the complete specification of the meter run behavior, which is as follows:

  1. project: the project primary key (determined implicitly from the trace).
  2. trace: the trace primary key (given in API request).
  3. status: the task status code (in this case "PENDING"), other options are:
  • "PENDING": which means the tasks is scheduled but not yet running or completed.
  • "RUNNING": task is currently running.
  • "SUCCESS": successful completion.
  • "FAILED": failed due to some sort of error.
  1. meter_result: the primary key of the meter result.
  2. meter_input: has not yet been created (this is the complete serialized input to the meter, as required by the eemeter.)
  3. model_class and model_kwargs: The model class and arguments used in meter fitting.
  • If these are left blank, default values will be used.
  1. formatter_class and formatter_kwargs: The formatter class and arguments used in meter fitting.
  • If these are left blank, default values will be used.

If you wish, you can also specify many of these properties explicitly and we will do so in a following section.

Let’s make another call to inspect the state of this meter run

In [12]:
meter_run = requests.get(
    base_url + "/api/v1/meter_runs/{}/".format(created_meter_run['id']),
    headers=headers
).json(object_pairs_hook=OrderedDict)
In [13]:
print(json.dumps(meter_run, indent=2))
{
  "id": 1,
  "trace": 1,
  "project": 1,
  "meter_result": 1,
  "meter_input": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_inputs/5fa24b58-444b-4c72-a8c9-bb0327b23118.json",
  "status": "SUCCESS",
  "failure_message": null,
  "traceback": null,
  "model_class": null,
  "model_kwargs": null,
  "formatter_class": null,
  "formatter_kwargs": null,
  "added": "2016-11-18T02:16:36.078334Z",
  "updated": "2016-11-18T02:17:44.356211Z"
}

The associated meter result is also available now and carries a set of outputs that include the meter run value and additionally:

  1. meter_output: serialized output of the meter run.
  2. eemeter_version and datastore_version: software version of eemeter library and datastore application
In [14]:
meter_result = requests.get(
    base_url + "/api/v1/meter_results/{}/".format(created_meter_run['meter_result']),
    headers=headers
).json(object_pairs_hook=OrderedDict)
In [15]:
print(json.dumps(meter_result, indent=2))
{
  "id": 1,
  "trace": 1,
  "project": 1,
  "meter_run": 1,
  "meter_output": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_outputs/e1896b44-0b89-49ac-93e1-8eb6e44987bd.json",
  "status": "SUCCESS",
  "eemeter_version": "0.4.12",
  "datastore_version": "0.2.3",
  "model_class": "BillingElasticNetCVModel",
  "model_kwargs": {
    "heating_base_temp": 65,
    "cooling_base_temp": 65
  },
  "formatter_class": "ModelDataBillingFormatter",
  "formatter_kwargs": {},
  "added": "2016-11-18T02:17:44.203325Z",
  "updated": "2016-11-18T02:17:44.223200Z"
}
Customizing meter runs

Meter runs can also be customized by specifying various attributes explicitly, such as custom arguments for the model class.

In [16]:
custom_meter_run = requests.post(
    base_url + "/api/v1/meter_runs/",
    json={
        "trace": 2,
        "project": 1,
        "model_kwargs": {
            "heating_base_temp": 64,  # different temperature
            "cooling_base_temp": 64,
        },
    },
    headers=headers
).json(object_pairs_hook=OrderedDict)
In [17]:
print(json.dumps(custom_meter_run, indent=2))
{
  "id": 2,
  "trace": 2,
  "project": 1,
  "meter_result": 2,
  "meter_input": null,
  "status": "PENDING",
  "failure_message": null,
  "traceback": null,
  "model_class": null,
  "model_kwargs": {
    "heating_base_temp": 64,
    "cooling_base_temp": 64
  },
  "formatter_class": null,
  "formatter_kwargs": null,
  "added": "2016-11-18T02:17:44.681341Z",
  "updated": "2016-11-18T02:17:44.681374Z"
}

Or, if you leave out the project and trace attributes, you can specify the exact serialized input. This means that if serialized meter inputs are available, you need not explicitly load traces and projects through ETL.

Please download a preformatted input file for this step.

In [18]:
with open('meter_input_example.json', 'r') as f:
    meter_input = f.read()  # loaded as a serialized string
    meter_input_meter_run = requests.post(
        base_url + "/api/v1/meter_runs/",
        json={
            "meter_input": meter_input,
        },
        headers=headers
    ).json(object_pairs_hook=OrderedDict)
In [19]:
print(json.dumps(meter_input_meter_run, indent=2))
{
  "id": 3,
  "trace": null,
  "project": null,
  "meter_result": 3,
  "meter_input": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_inputs/bf0629db-0c81-4ded-8dcc-adbd0ddbf3f3.json",
  "status": "PENDING",
  "failure_message": null,
  "traceback": null,
  "model_class": null,
  "model_kwargs": null,
  "formatter_class": null,
  "formatter_kwargs": null,
  "added": "2016-11-18T02:19:23.155268Z",
  "updated": "2016-11-18T02:19:23.155857Z"
}
In [20]:
meter_run = requests.get(
    base_url + "/api/v1/meter_runs/{}/".format(meter_input_meter_run['id']),
    headers=headers
).json(object_pairs_hook=OrderedDict)
print(json.dumps(meter_run, indent=2))
{
  "id": 3,
  "trace": null,
  "project": null,
  "meter_result": 3,
  "meter_input": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_inputs/bf0629db-0c81-4ded-8dcc-adbd0ddbf3f3.json",
  "status": "SUCCESS",
  "failure_message": null,
  "traceback": null,
  "model_class": null,
  "model_kwargs": null,
  "formatter_class": null,
  "formatter_kwargs": null,
  "added": "2016-11-18T02:19:23.155268Z",
  "updated": "2016-11-18T02:22:13.679220Z"
}
In [21]:
meter_result = requests.get(
    base_url + "/api/v1/meter_results/{}/".format(meter_input_meter_run['meter_result']),
    headers=headers
).json(object_pairs_hook=OrderedDict)
print(json.dumps(meter_result, indent=2))
{
  "id": 3,
  "trace": null,
  "project": null,
  "meter_run": 3,
  "meter_output": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_outputs/7ab2bd31-a723-4c75-afa6-424d560ab284.json",
  "status": "SUCCESS",
  "eemeter_version": "0.4.12",
  "datastore_version": "0.2.3",
  "model_class": "SeasonalElasticNetCVModel",
  "model_kwargs": {
    "heating_base_temp": 65,
    "cooling_base_temp": 65
  },
  "formatter_class": "ModelDataFormatter",
  "formatter_kwargs": {
    "freq_str": "D"
  },
  "added": "2016-11-18T02:22:13.566866Z",
  "updated": "2016-11-18T02:22:13.583138Z"
}

Meters can also be triggered in bulk; the next section covers this.

Bulk-triggering meter runs

Often it is more convenient to trigger many meter runs at once than to do it trace-by-trace. This can be done either through the API or through a datastore management command.

Through the API

The following sends a list of “targets” to the datastore for triggering. Here, we’re triggering a set of meter runs for one project, which will trigger meter runs for all associated traces.

Warning:

The following may take a few minutes to complete. If you have enabled celery workers, it will execute more quickly and computation will continue in the background. If this is the case for you, you should wait until that computation has completed before continuing.

For more information on background worker setup, see datastore setup instructions.

To follow progress, watch the datastore logs or use the meter_progress command. In a development environment, these are printed in the python manage.py runserver output.

In [22]:
bulk_created_meter_runs = requests.post(
    base_url + "/api/v1/meter_runs/bulk/",  # note: different url!
    json={
        "targets": [  # a list of targets can be provided
            {
                "project": projects[0]['id']
            },
        ]
    },
    headers=headers
).json(object_pairs_hook=OrderedDict)
In [23]:
print(json.dumps(bulk_created_meter_runs, indent=2))
[
  [
    {
      "id": 4,
      "trace": 3,
      "project": 1,
      "meter_result": 4,
      "meter_input": null,
      "status": "PENDING",
      "failure_message": null,
      "traceback": null,
      "model_class": null,
      "model_kwargs": null,
      "formatter_class": null,
      "formatter_kwargs": null,
      "added": "2016-11-18T02:22:14.088620Z",
      "updated": "2016-11-18T02:22:14.088658Z"
    },
    {
      "id": 5,
      "trace": 4,
      "project": 1,
      "meter_result": 5,
      "meter_input": null,
      "status": "PENDING",
      "failure_message": null,
      "traceback": null,
      "model_class": null,
      "model_kwargs": null,
      "formatter_class": null,
      "formatter_kwargs": null,
      "added": "2016-11-18T02:26:15.683349Z",
      "updated": "2016-11-18T02:26:15.683442Z"
    },
    {
      "id": 6,
      "trace": 5,
      "project": 1,
      "meter_result": 6,
      "meter_input": null,
      "status": "PENDING",
      "failure_message": null,
      "traceback": null,
      "model_class": null,
      "model_kwargs": null,
      "formatter_class": null,
      "formatter_kwargs": null,
      "added": "2016-11-18T02:28:00.757629Z",
      "updated": "2016-11-18T02:28:00.757666Z"
    },
    {
      "id": 7,
      "trace": 6,
      "project": 1,
      "meter_result": 7,
      "meter_input": null,
      "status": "PENDING",
      "failure_message": null,
      "traceback": null,
      "model_class": null,
      "model_kwargs": null,
      "formatter_class": null,
      "formatter_kwargs": null,
      "added": "2016-11-18T02:29:09.066736Z",
      "updated": "2016-11-18T02:29:09.066777Z"
    },
    {
      "id": 8,
      "trace": 1,
      "project": 1,
      "meter_result": 8,
      "meter_input": null,
      "status": "PENDING",
      "failure_message": null,
      "traceback": null,
      "model_class": null,
      "model_kwargs": null,
      "formatter_class": null,
      "formatter_kwargs": null,
      "added": "2016-11-18T02:32:05.062196Z",
      "updated": "2016-11-18T02:32:05.062238Z"
    },
    {
      "id": 9,
      "trace": 2,
      "project": 1,
      "meter_result": 9,
      "meter_input": null,
      "status": "PENDING",
      "failure_message": null,
      "traceback": null,
      "model_class": null,
      "model_kwargs": null,
      "formatter_class": null,
      "formatter_kwargs": null,
      "added": "2016-11-18T02:32:30.471808Z",
      "updated": "2016-11-18T02:32:30.471864Z"
    }
  ]
]

Note that results are returned grouped by target (as a list).

If model or formatter class or kwarg arguments are supplied, they will be applied to all meter_runs.

Through a management command

The other way to bulk-trigger meter runs is through a management command.

python manage.py run_meters --all-traces

You can monitor the progress of these commands with:

python manage.py meter_progress --all-meters --poll-until-complete
Meter result warehouse tables

For easy access to summarized meter result data, it may be helpful to use the meter result “mart”, which is part of the data warehouse that can be created in the postgres database.

Data warehouse tables make it easier to query into results by summarizing the most relevant information.

To create warehouse tables, use the following management command:

$ python manage.py meterresultmart recreate

This is equivalent to running

$ python manage.py meterresultmart destroy
$ python manage.py meterresultmart create

Running the create command without first destroying will give duplicate rows.

Using the warehouse_meterresultmart table

The easiest way to access the results of the warehouse is to connect an analytics service which can read from the database directly.

If that is not available to you, you can also query directly with postgres. Assuming you have a database set up called “datastore” (yours may be named differently, depending on how you set it up), you can connect as follows:

$ psql datastore
psql (9.4.1)
Type "help" for help.

datastore=# SELECT
  trace_id
  , differential_lower_bound as savings_lower_bound
  , differential_value as savings
  , differential_upper_bound as savings_upper_bound
FROM
  warehouse_meterresultmart
WHERE
  project_id='DEV_SEED_PROJECT'
AND
  derivative_interpretation='gross_predicted'
ORDER BY
  project_id
  , trace_id
  , derivative_interpretation;

              trace_id              | savings_lower_bound |      savings      | savings_upper_bound
------------------------------------+---------------------+-------------------+---------------------
 DEV_SEED_TRACE_ELECTRICITY_15MIN   |    2.21781163934072 |  4.93938974304001 |     7.6609678467393
 DEV_SEED_TRACE_ELECTRICITY_HOURLY  |      10.47300113325 |  12.9734969290002 |    15.4739927247505
 DEV_SEED_TRACE_NATURAL_GAS_DAILY   |   -12.4594891213768 | -6.03261538803008 |   0.394258345316612
 DEV_SEED_TRACE_NATURAL_GAS_MONTHLY |   -774.348987437802 | -580.576019960851 |     -386.8030524839
 DEV_SEED_TRACE_SOLAR_30MIN         |   0.848394785466816 |  3.81981394853938 |    6.79123311161194
 DEV_SEED_TRACE_SOLAR_HOURLY        |                     |                   |
(6 rows)
datastore=#
Aggregations and groups

Traces can be aggregated by putting them into groups and triggering aggregation runs.

Groups must be named, and are defined by combinations of filters over project_id, trace_id, or arbitary project metadata.

Filters are created with the following attributes as either a “filter” or a “filter_boolean”, which is combination of two filters.

Filter types:

"filter":

  • "target", can be:
    • “project_id”
    • “trace_id”
    • “project_metadata|NAME_OF_ATTRIBUTE”
  • "comparison", can be:
    • “>”, “>=”, “<”, “<=”, “==”, “!=”
    • “in”, “not in”
  • "value", can be:
    • int, float, str (for comparisons “>”, “>=”, “<”, “<=”, “==”, “!=”)
    • list of values (for comparisons “in”, “not in”)

"filter_boolean":

  • "boolean", can be:
    • “and”, “or”
  • "filter_a", can be:
    • filter, filter_boolean
  • "filter_b", can be:
    • filter, filter_boolean

Example filter specification creation:

In [24]:
filter_specification = {
    "filter": {
        "target": "project_id",
        "comparison": "==",
        "value": projects[0]["project_id"],
    }
}
In [25]:
trace_group = requests.post(
    base_url + "/api/v1/trace_groups/",  # note: different url!
    json={
        "name": "project_group",
        "filter_specification": filter_specification,
    },
    headers=headers
).json(object_pairs_hook=OrderedDict)
In [26]:
print(json.dumps(trace_group, indent=2))
{
  "id": 3,
  "name": "project_group",
  "filter_specification": {
    "filter": {
      "comparison": "==",
      "target": "project_id",
      "value": "DEV_SEED_PROJECT"
    }
  }
}
In [27]:
aggregation_run = requests.post(
    base_url + "/api/v1/aggregation_runs/",
    json={
        "group": trace_group['id'],
        "trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
        "derivative_interpretation": "annualized_weather_normal",
    },
    headers=headers
).json(object_pairs_hook=OrderedDict)
In [28]:
print(json.dumps(aggregation_run, indent=2))
{
  "id": 7,
  "group": 3,
  "aggregation_result": 1,
  "aggregation_input": null,
  "status": "PENDING",
  "traceback": null,
  "failure_message": null,
  "trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
  "derivative_interpretation": "annualized_weather_normal",
  "aggregation_interpretation": "SUM",
  "added": "2016-11-18T03:04:00.425945Z",
  "updated": "2016-11-18T03:04:00.426601Z"
}
In [29]:
aggregation_run = requests.get(
    base_url + "/api/v1/aggregation_runs/{}/".format(aggregation_run["id"]),
    headers=headers
).json(object_pairs_hook=OrderedDict)
print(json.dumps(aggregation_run, indent=2))
{
  "id": 7,
  "group": 3,
  "aggregation_result": 1,
  "aggregation_input": "https://storage.googleapis.com/my-storage-bucket/datastore/aggregation_inputs/2b986708-1076-4a16-a2dd-31a78f93d817.json",
  "status": "SUCCESS",
  "traceback": null,
  "failure_message": null,
  "trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
  "derivative_interpretation": "annualized_weather_normal",
  "aggregation_interpretation": "SUM",
  "added": "2016-11-18T03:04:00.425945Z",
  "updated": "2016-11-18T03:04:03.806500Z"
}
In [30]:
aggregation_result = requests.get(
    base_url + "/api/v1/aggregation_results/{}/".format(aggregation_run["aggregation_result"]),
    headers=headers
).json(object_pairs_hook=OrderedDict)
print(json.dumps(aggregation_result, indent=2))
{
  "id": 1,
  "aggregation_run": 7,
  "trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
  "derivative_interpretation": "annualized_weather_normal",
  "aggregation_interpretation": "SUM",
  "aggregation_output": "https://storage.googleapis.com/my-storage-bucket/datastore/aggregation_outputs/4295fd00-4ecd-494e-9eeb-884ef620ce14.json",
  "derivatives": [
    7,
    9
  ],
  "unit": "KWH",
  "baseline_value": 4863.15574521486,
  "baseline_lower": 1.5110421742195,
  "baseline_upper": 1.5110421742195,
  "baseline_n": 730.0,
  "reporting_value": 4860.88336194519,
  "reporting_lower": 0.545198012058528,
  "reporting_upper": 0.545198012058528,
  "reporting_n": 730.0,
  "differential_direction": "BASELINE_MINUS_REPORTING",
  "differential_value": 2.27238326967017,
  "differential_lower": 1.60639015330105,
  "differential_upper": 1.60639015330105,
  "differential_n": 1460.0,
  "eemeter_version": "0.4.12",
  "datastore_version": "0.2.3",
  "added": "2016-11-18T03:04:03.701863Z",
  "updated": "2016-11-18T03:04:03.701911Z"
}
Additional filter examples:

All traces:

None  # leave blank

Traces with project cost less than or equal to 10000:

{
    "filter": {
        "target": "project_metadata|project_cost",
        "comparison": "<=",
        "value": 10000,
    }
}

Traces with project_id in particular set:

{
    "filter": {
        "target": "project_id",
        "comparison": "in",
        "value": [
            "PROJECT_101",
            "PROJECT_102"
        ]
    }
}

Traces with project_id in particular set or with project cost greater than or equal to 5000:

{
    "filter_boolean": {
        "boolean": "or",
        "filter_a": {
            "filter": {
                "target": "project_id",
                "comparison": "in",
                "value": [
                    "PROJECT_101",
                    "PROJECT_102"
                ]
            }
        },
        "filter_b": {
            "filter": {
                "target": "project_metadata|project_cost",
                "comparison": ">=",
                "value": 5000,
            }
        }
    }
}

Deeply nested filter:

{
    "filter_boolean": {
        "boolean": "and",
        "filter_a": {
            "filter_boolean": {
                "boolean": "and",
                "filter_a": {
                    "filter": {
                        "target": "project_metadata|contractor",
                        "comparison": "==",
                        "value": "AAA CONTRACTING",
                    }
                },
                "filter_b": {
                    "filter": {
                        "target": "project_metadata|project_type",
                        "comparison": "!=",
                        "value": "SOLAR"
                    }
                },
            }
        },
        "filter_b": {
            "filter": {
                "target": "project_metadata|project_cost",
                "comparison": ">=",
                "value": 5000,
            }
        }
    }
}
Group statistics warehouse tables

For easy access to summarized aggregated data, it may be helpful to use the group statistics mart, which is part of the data warehouse that can be created in the postgres database.

This supplements the meter result mart by providing summarized group statistics.

Just as with the meter result mart, the group statistics mart can also be created with a management command:

$ python manage.py groupstatisticsmart recreate
PostgreSQL tables

A data dictionary describing available datastore database tables.

Core project and trace data (i.e., data loaded through ETL)
Name of Table Name of Row Description of Row
datastore_project    
  id Primary key
  project_id Unique project identifier provided by the user
  baseline_period_start [null]
  baseline_period_end Populated through ETL from project data
  reporting_period_start Populated through ETL from project data
  reporting_period_end [null]
  zipcode Populated through ETL from project data
  project_owner_id Optional foreign key to datastore_projectowner table
  added Date Added
  updated Date updated
datastore_trace   Refers to an energy trace (a time series of data from a meter)
  id Primary key
  trace_id Unique identifier for trace
  interpretation Type of energy data.
  unit Unit of measure
  added Date that the data was added to the database
  updated Timestamp for last updated
datastore_tracerecord   Single point in trace timeseries
  id Primary key
  trace_id Foreign key to datastore_trace table
  value Value from start of this record to start of the next record
  estimated True/False
  start Start time of interval; end is given by next record (as ordered by start timestamp).
datastore_project_traces   Many-to-many table linking projects and traces
  project_id Foriegn key to datastore_project table
  trace_id Foriegn key to datastore_trace table
datastore_projectmetadata   Project metadata
  project_id Foriegn key to datastore_project table
  key String identifying metadata type
  value Value of metadata
datastore_tracegroup   Grouping of traces defined by a filter
  name Name of group
  filter_specification JSON specification of filter defining group
Meter run and meter result data

Metering tables

Name of Table Name of Row Description of Row
metering_meterderivative   Table of predictive and descriptive summaries of savings
  id Primary key
  interpretation Interpretation of derivative (e.g., gross_predicted/annualized_weather_normal)
  unit Unit of values, upper and lower bounds.
  baseline_value Modeled counterfactual baseline value
  baseline_lower Amount to be subtracted from baseline_value to obtain lower bound on 95% confidence interval
  baseline_upper Amount to be added to baseline_value to obtain upper bound on 95% confidence interval
  baseline_n Number of points in baseline demand fixture
  reporting_value Modeled reporting period value
  reporting_lower Amount to be subtracted from reporting_value to obtain lower bound on 95% confidence interval
  reporting_upper Amount to be added to reporting_value to obtain upper bound on 95% confidence interval
  reporting_n Number of points in reporting demand fixture
  added Date added
  updated Date updated
  meter_result_id Primary key of meter result this derivative was extracted from
  modeling_period_group_id Primary key of modeling period group describing baseline and reporting period details
  trace_id Primary key of trace this derivative applies to
metering_meterresult   Table of meter run results
  id Primary key
  meter_output Filename of JSON serialization of meter output
  status SUCCESS/FAILURE
  eemeter_version Version of eemeter library used to calculate this result
  datastore_version Version of datastore application used to calculate this result
  model_class Name of model class
  model_kwargs Keyword arguments to model class
  formatter_class Name of formatter class
  formatter_kwargs Keyword arguments to formatter class
  added Date added
  updated Date updated
  meter_run_id Primary key of meter run
  project_id Primary key of project data
  trace_id Primary key of trace
metering_meterrun   Table of meter runs
  id Primary key
  meter_input Filename of JSON serialiation
  status PENDING/RUNNING/SUCCESS/FAILURE
  failure_message Failure message, if any
  traceback Traceback text, if error occured
  model_class Name of model class supplied, if any
  model_kwargs Model class keyword arguments supplied, if any
  formatter_class Name of formatter class supplied, if any
  formatter_kwargs Formatter class keyword arguments supplied, if any
  added Date added
  updated Date updated
  project_id Primary key of project data
  trace_id Primary key of trace
metering_modelingperiod   Table describing a modeling period
  id Primary key
  label Label to distinguish from other baseine/reporting/periods in same meter result
  interpretation BASELINE/REPORTING
  start Date of modeling period start, if any (can be blank for baseline)
  end Date of modeling period end, if any (can be blank for reporting)
  meter_result_id Primary key of containing meter result
metering_modelingperiodgroup   Table describing a pair of modeling periods (baseline + reporting)
  id Primary key
  baseline_id Primary key of baseline modeling period
  meter_result_id Primary key of containing meter result
  reporting_id Primary key of reporting modeling period
metering_modelresult   Table storing results from modeling
  id Primary key
  status SUCCESS/FAILURE
  traceback Traceback, if any
  start_date Start date of data used in modeling
  end_date End date of data used in modeling
  n_rows number of rows supplied as input to modeling
  r2 R-squared model fit
  cvrmse Coefficient of variation of root mean squared error (rmse normalized by mean)
  rmse root mean squared error
  lower Value to be subtracted from any individual predicted point to obtain lower bound on 95% confidence interval
  upper Value to be added to aby individual predicted point to obtain upper bound on 95% confidence interval
  added Date added
  updated Date updated
  meter_result_id Primary key of meter result
  modeling_period_id Primary key of modeling period
  trace_id Primary key of trace
Metering tables
Name of Table Name of Row Description of Row
metering_aggregationrun   Aggregation task
  id Primary key
  aggregation_input Serialized aggregation input
  status PENDING/RUNNING/SUCCESS/FAILURE
  failure_message Failure message, if any
  traceback Traceback text, if error occured
  trace_interpretation Type of trace in this aggregation
  derivative_interpretation Type of derivative in this aggregation
  aggregation_interpretation Type of aggregation to be performed
  group_id Foreign key to datastore_tracegroup table
  added Date added
  updated Date updated
metering_aggregationresult   Aggregation task result
  id Primary key
  aggregation_input Serialized aggregation output
  trace_interpretation Type of trace in this aggregation
  derivative_interpretation Type of derivative in this aggregation
  aggregation_interpretation Type of aggregation to be performed
  eemeter_version Version of eemeter library used to calculate this result
  datastore_version Version of datastore application used to calculate this result
  unit Unit of measure
  baseline_value Modeled counterfactual baseline value
  baseline_lower Amount to be subtracted from baseline_value to obtain lower bound on 95% confidence interval
  baseline_upper Amount to be added to baseline_value to obtain upper bound on 95% confidence interval
  baseline_n Number of points in combined baseline demand fixtures
  reporting_value Modeled counterfactual reporting value
  reporting_lower Amount to be subtracted from reporting_value to obtain lower bound on 95% confidence interval
  reporting_upper Amount to be added to reporting_value to obtain upper bound on 95% confidence interval
  reporting_n Number of points in combined reporting demand fixtures
  differential_direction BASELINE_MINUS_REPORTING/REPORTING_MINUS_BASELINE
  differential_value Modeled counterfactual differential value
  differential_lower Amount to be subtracted from differential_value to obtain lower bound on 95% confidence interval
  differential_upper Amount to be added to differential_value to obtain upper bound on 95% confidence interval
  differential_n Number of points in combined differential demand fixture
  added Date added
  updated Date updated
  aggregation_run_id Foreign key to metering_aggregationrun table
metering_aggregationderivativestatus   Status of inclusion in aggregation
  id Primary key
  status ACCEPTED/REJECTED
  baseline_status Baseline result ACCEPTED or REJECTED
  reporting_status Reporting result ACCEPTED or REJECTED
  aggregation_result_id Foreign key to metering_aggregationresult table
  derivative_id Foreign key to metering_meterderivative table
Warehouse tables
Name of Table Name of Row Description of Row
warehouse_meterresultmart   Summarized meter results
  id Primary key
  trace_id Trace identifing string
  trace_pk Primary key of trace
  trace_interpretation Type of trace
  trace_unit Unit of measure of trace
  project_id Project identifying string
  project_pk Primary key of project
  serialized_input_url Cloud storage location of serialized input
  serialized_output_url Cloud storage location of serialized output
  meter_result_pk Primary key of meter result
  meter_result_status Meter result status
  meter_result_eemeter_version eemeter library software version
  meter_result_datastore_version datastore library software version
  meter_result_model_class Model class used in model fitting
  meter_result_model_kwargs Keyword arguments used in model class initialization
  meter_result_formatter_class Formatter class used in model data formatting
  meter_result_formatter_kwargs Keyword arguments used in formatter class initialization
  meter_result_added Date meter result added
  meter_result_updated Date meter result updated
  meter_run_pk Primary key of meter run
  meter_run_status Meter run status
  meter_run_failure_message Failure message (if any)
  meter_run_traceback Traceback (if any)
  meter_run_added Date meter run added
  meter_run_updated Date meter result added
  modeling_period_group_pk Primary key of modeling period
  derivative_pk Primary key of derivative
  derivative_interpretation Type of derivative
  derivative_unit Unit of measure of derivative
  baseline_period_pk Primary key of baseline period
  baseline_period_label Label of baseline period
  baseline_period_start Start date of baseline period (if any)
  baseline_period_end End date of baseline period
  baseline_model_result_pk Primary key of baseline model result
  baseline_model_result_status Status of baseline model result
  baseline_model_result_traceback Traceback if failed
  baseline_model_result_r2 R squared
  baseline_model_result_cvrmse Coefficient of variation of root mean squared error
  baseline_model_result_n_rows Number of rows in input
  baseline_model_result_rmse Root mean squared error
  baseline_derivative_value Baseline derivative value
  baseline_derivative_lower_bound 95 percent confidence lower bound on baseline derivative value
  baseline_derivative_upper_bound 95 percent confidence upper bound on baseline derivative value
  reporting_period_pk Primary key of reporting period
  reporting_period_label Label of reporting period
  reporting_period_start Start date of reporting period (if any)
  reporting_period_end End date of reporting period
  reporting_model_result_pk Primary key of reporting model result
  reporting_model_result_status Status of reporting model result
  reporting_model_result_traceback Traceback if failed
  reporting_model_result_r2 R squared
  reporting_model_result_cvrmse Coefficient of variation of root mean squared error
  reporting_model_result_n_rows Number of rows in input
  reporting_model_result_rmse Root mean squared error
  reporting_derivative_value Reporting derivative value
  reporting_derivative_lower_bound 95 percent confidence lower bound on reporting derivative value
  reporting_derivative_upper_bound 95 percent confidence upper bound on reporting derivative value
  differential_value Savings value
  differential_direction BASELINE_MINUS_REPORTING/REPORTING_MINUS_BASELINE
  differential_lower_bound 95 percent confidence lower bound on savings value
  differential_upper_bound 95 percent confidence upper bound on savings value
warehouse_groupstatisticsmart   Summaries group statistics
  id Primary key
  group_name Name of group
  group_pk Primary key of group
  serialized_input_url Cloud storage location of serialized input
  serialized_output_url Cloud storage location of serialized output
  aggregation_run_pk Primary key of aggregation run
  aggregation_run_status Status of aggregation run
  aggregation_run_failure_message Failure message (if any)
  aggregation_run_traceback Traceback (if any)
  aggregation_run_added Date added
  aggregation_run_updated Date updated
  aggregation_result_pk Primary key of aggregation result
  n_derivatives Number of derivatives in group
  aggregation_result_added Date added
  aggregation_result_updated Date updated
  aggregation_result_eemeter_version eemeter library software version
  aggregetion_result_datastore_version datastore application software version
  trace_interpretation Type of trace included in aggreation
  derivative_interpretation Type of derivative included in aggregation
  statistic_interpretation Type of aggregation done
  statistic_unit Unit of measure
  baseline_value Aggregated baseline value
  baseline_lower_bound 95 percent confidence lower bound
  baseline_upper_bound 95 percent confidence upper bound
  reporting_value Aggregated reporting value
  reporting_lower_bound 95 percent confidence lower bound
  reporting_upper_bound 95 percent confidence upper bound
  differential_value Aggregated differential value
  differential_direction BASELINE_MINUS_REPORTING/REPORTING_MINUS_BASELINE
  differential_lower_bound 95 percent confidence lower bound
  differential_upper_bound 95 percent confidence upper bound
  n_derivatives_accepted Number of derivatives in group accepted
  n_derivatives_accepted_baseline Number of derivatives in group with accepted baseline result
  n_derivatives_accepted_reporting Number of derivatives in group with accepted reporting result
  n_derivatives_rejected Number of derivatives in group rejected
  n_derivatives_rejected_baseline Number of derivatives in group with rejected baseline result
  n_derivatives_rejected_reporting Number of derivatives in group with rejected reporting result
Management commands

The following management commands are available for usage on the datastore.

dev_seed

Creates an admin user:

  • username: demo
  • password: demo-password
  • access token: tokstr

Creates a sample project with the id DEV_SEED_PROJECT with the following traces:

  • DEV_SEED_TRACE_NATURAL_GAS_MONTHLY
  • DEV_SEED_TRACE_NATURAL_GAS_DAILY
  • DEV_SEED_TRACE_ELECTRICITY_15MIN
  • DEV_SEED_TRACE_ELECTRICITY_HOURLY
  • DEV_SEED_TRACE_SOLAR_HOURLY
  • DEV_SEED_TRACE_SOLAR_30MIN

Example usage:

python manage.py dev_seed
prod_seed

Creates an admin user with generated password and access token:

  • username: admin
  • password: <generated password>
  • access token: <generated token>

The generated password and access token will be shown in the output:

Admin password: <generated password>
Admin token: <generated token>

Example usage:

python manage.py prod_seed
trace_record_indexes

Creates and destroy indexes as part of loading TraceRecords.

Loading raw data is significantly faster if indexes and foreign key constraints are dropped and rebuilt after importing.

This command inspects the current indexes and constraints, dropping all but the primary key indexes.

If new indexes are added, they should be added here (not in model classes) so that they are properly rebuilt during imports.

The results of this command can be inspected through psql:

=> \d datastore_tracerecord

With indexes, the description will look something like this:

Indexes:
    "datastore_tracerecord_pkey" PRIMARY KEY, btree (id)
    "datastore_tracerecord_ffe73c23" btree (trace_id)
Foreign-key constraints:
    "datast_trace_id_53e4466e_fk_datastore_trace_id"
    FOREIGN KEY (trace_id) REFERENCES datastore_trace(id)
    DEFERRABLE INITIALLY DEFERRED

Without indexes, it will look something like this:

Indexes:
    "datastore_tracerecord_pkey" PRIMARY KEY, btree (id)

Example usage:

To destroy trace_records (before ETL):

python manage.py trace_record_indexes destroy

To create trace_records (after ETL):

python manage.py trace_record_indexes create
run_meters

Triggers meter runs for specified projects or traces.

Example usage:

python manage.py run_meters --all-traces

Optional arguments:

--projects PROJECTS [PROJECTS ...]
                      Project ids to run
--traces TRACES [TRACES ...]
                      Trace ids to run
--all-projects        Run meters for all projects, overrides --projects
--all-traces          Run meters for all traces, overrides --traces
--use-project-id      Use project_id, not id, for any projects to run
--use-trace-id        Use trace_id, not id, for any traces to run
--purge-queue         Purges celery queue before adding meter runs
--detailed-output     Provides more detailed project and trace level output
                      re: meter ids
--delete-previous-meters
                      Delete old meter runs associated with these ids
meter_progress

Check progress of one or more meter runs.

Example usage:

python manage.py meter_progress --all-meters

Optional arguments:

--meters METERS [METERS ...]
                      Meter ids to check
--all-meters          Check progress for all meters
--poll-until-complete
                      Repeatedly check progress until all meters complete
--poll-interval POLL_INTERVAL
                      Seconds to wait between checks if --poll-until-
                      complete
--poll-max POLL_MAX   Max number of seconds to poll if --poll-until-complete
                      before exiting
delete_meters

Delete meter runs.

Example usage:

python manage.py delete_meters

Optional arguments:

--meters METERS [METERS ...]
                      Meter ids to delete
--traces TRACES [TRACES ...]
                      Trace ids to delete associated meters
--projects PROJECTS [PROJECTS ...]
                      Project ids to delete associated meters
run_aggregations

Run aggregations of meter results by group.

Example usage:

python manage.py run_aggregations --all-groups

Optional arguments:

--group-names GROUP_NAMES [GROUP_NAMES ...]
                      Groups against which to run aggregations
--all-groups          Run aggregations for all groups; overrides
                      --group_names
meterresultmart

Create and destroy the data warehouse mart for meter results.

The warehouse table is warehouse_meterresultmart

Example usage:

python manage.py meterresultmart create
python manage.py meterresultmart destroy
modelresultmart

Create and destroy the data warehouse mart for model results.

The warehouse table is warehouse_modelresultmart

Example usage:

python manage.py modelresultmart create
python manage.py modelresultmart destroy
projectsummarymart

Create and destroy a data mart for metering results organized by project for a charting frontend.

The warehouse table is warehouse_projectsummarymart

Example usage:

python manage.py projectsummarymart create
python manage.py projectsummarymart destroy
tracesummarymart

Create and destroy a data mart that summarizes traces and their records.

The warehouse table is warehouse_tracesummarymart

Example usage:

python manage.py tracesummarymart create
python manage.py tracesummarymart destroy
geoinfo

Create and destroy two tables for geographical information

The warehouse tables are warehouse_zctainfo and warehouse_countyinfo

Example usage:

python manage.py geoinfo create
python manage.py geoinfo destroy

API

ETL Toolkit

The ETL toolkit is provided to assist moving data from its source into the datastore.

“ETL” stands for Extract-Transform-Load. These three steps outline the actions the ETL toolkit helps with and are as follows:

  • Extract: obtain data from an external (non-datastore) source.
  • Transform: convert that data into a form usable the datatore.
  • Load: move the transformed data into the datastore.

The ETL library is not run directly. Rather, its components are used to build ETL pipelines that are specific to a datastore instance.

Installation

To install the ETL library, run the following:

$ git clone https://github.com/openeemeter/etl
$ cd etl
$ pip install -r requirements.txt

For more information, see github.

API

License

MIT