Warning
The eemeter package is under rapid development; we are working quickly toward a stable release. In the mean time, please proceed to use the package, but as you do so, recognize that the API is in flux and the docs might not be up-to-date. Feel free to contribute changes or open issues on github to report bugs, request features, or make suggestions.
The Open Energy Efficiency Meter¶
This package holds the core methods used by the of the Open Energy
Efficiency energy efficiency metering stack. Specifically, the eemeter
package abstracts the process of building and evaluating models of energy
consumption or generation and of using those to evaluate the effect of energy
efficiency interventions at a particular site associated with a particular
project.
The eemeter
package is only one part of the larger Open Energy Efficiency
technology stack. Briefly, the architecture of the stack is as follows:
eemeter
: Given project and energy data, theeemeter
package is responsible for creating models of energy usage under different project conditions, and for using those models to evaluate energy efficiency projects.datastore
: Thedatastore
application is responsible for validating and storing project data and associated energy data, for using theeemeter
to evaluate the effectiveness of these projects using the data it stores, and for storing and serving those results. It exposes as REST API for handling these functions.etl
: Theetl
package provides tooling which helps to extract data from various formats, transform that data into the format accepted by datastore, and load that transformed data into the appropriatedatastore
instance. ETL stands for Extract, Transform, Load.
Usage¶
Guides¶
Introduction¶
The OpenEEmeter is an open source software package that uses metered energy data to manage aggregate demand capacity across a portfolio of retail customer accounts. The software package consists of three main parts:
- an Extract-Transform-Load (ETL) toolkit for processing project, energy, and building data (https://github.com/openeemeter/etl/);
- a core calculation library (this package) that implements standardized methods (https://github.com/openeemeter/eemeter/); and
- a datastore application for storing post-ETL inputs and computed outputs (https://github.com/openeemeter/datastore/).
More information about this architecture can be found in Architecture Overview.
Core use cases¶
The OpenEEmeter has been designed specifically to provide weather-normalized energy savings measurements for a portfolio of projects using monthly billing data or interval smart meter data. The main outputs for this core use case are project and portfolio-level are:
- Gross Energy Savings
- Annualized Energy Savings
- Realization Rate (when savings predictions are available)
More information about these methods can be found in Methods Overview.
Other potential use cases¶
The OpenEEmeter can also be configured to manage energy resources across a portfolio of buildings, including potentially:
- Analytics of raw energy data
- Portfolio management
- Demand side resource management
Data requirements¶
The EEmeter requires a combination of trace data, project data, and weather data to calculate weather-normalized savings. At its most rudimentary, the EEmeter requires a trace of consumption data along with project data indicating the completion date and location of the project.
The completion of a project demarcates the shift between a baseline modeling period and a reporting modeling period. For more information on this, see Methods Overview.
The EEmeter is configured to manage project and trace data. Trace data can be electricity, natural gas, or solar photovoltaic data of any frequency - from monthly billing data to high-frequency sensor data (see 1) Meters and Smart Meters - where does energy data come from?).
Where project and trace data originate from different database sources, a common key must be available to link projects with their respective traces.
Project data¶
Project data is typically a set of attributes that can be used for advanced savings analytics, but at minimum must contain a date to demarcate start and end of intervention periods.
Each project must have, at minimum:
- a unique project id
- start and end dates of known interventions
- a ZIP code (for gathering associated weather data)
- a set of associated traces
Other data can also be associated with projects, including (but not limited to):
- savings predictions
- square footage
- cost
Trace data¶
Each trace must have, at minimum,
- a link to a project id
- a unique id of its own
- an interpretation
- a set of records
Each record within a trace must have:
- a time period (start and end dates)
- a value and assiciated units of
- a boolean “estimated” flag
The EEmeter will reject traces not meeting built-in data sufficiency requirements.
Loading data¶
The eemeter python package is a calculation engine which is not desigend for data storage. Instead, project and trace data are stored in the datastore alongside outputs from the eemeter.
To load data into the datastore, EEMeter comes bundled with an ETL Toolkit. If you are deploying the open source software, you will need to write or customize a parser to load your data into the ETL pipeline. We rely on a python module called luigi to manage the bulk importation of data.
More on this architecture.
External analysis¶
You may decide that you want to use EEmeter results to analyze project data that does not get parsed and uploaded into the datastore. We have made it easy to export your EEmeter results through an API or through a web interface. Other options include a direct database connection to a BI tool like Tableau or Salesforce.
Background¶
1) Meters and Smart Meters - where does energy data come from?¶
Energy data is generated by hardware devices that measure electricity and natural gas flow. A device like this is generally referred to as a “meter” (though this is distinct from the software-based “EEmeter” - see Methods Overview). The most common and ubiquitous measuring device is a utility-owned meter used for determining billing. Some utilities have upgraded their meters to provide hourly or 15-minute interval measurements. These so-called “smart meters” use Advanced Metering Infrastructure (AMI) to transmit data back to utilities for processing in near-real time. Other devices that generate energy data include sub-meters, external sensors, and embedded sensors.
Note
The “smart” in smart meter can be a bit of a misnomer. Despite higher measurement frequency and wireless data transmission, these smart meters collect essentially the same data that electricity meters did in the 1950s. Each meter datapoint consists of a timestamp and an incremental value of consumption. We call this string of data characterized by paired sets of timestamps and meter readings a trace. Traces form the basis of the energy modeling in the EEmeter.
Just like the odometer in your car doesn’t tell you how fast you are traveling, the meter on your house doesn’t tell you how much energy you have consumed. Consumption must be calculated. In the past, energy companies simply determined your rate of consumption by taking monthly meter readings and calculating the difference. With smart meters, these datapoints can be captured more frequently and with greater precision, allowing for more sophisticated forms of billing.
2) Measuring Energy Savings and the Transition to Demand Side Management¶
The OpenEEmeter replaces traditional approaches to program-related energy measurement. Utilizing newly available smart meter data, the OpenEEmeter solves the problem of measuring energy savings and opens new doors for managing demand side programs.
Historically, energy savings have been measured in one of three ways. The first (and least costly) approach is to take laboratory measurements of different energy-consuming devices (e.g., light bulbs) and calculate the difference in consumption from one to the next, then estimate the savings over a given period of time, taking into consideration typical usage patterns. This first approach is limited by the accuracy and availability of physical models.
The second (and most costly) approach samples consumption data prior to and following an intervention of some sort (e.g., an energy efficiency retrofit), and estimates savings after controlling for building-specific factors like occupancy, temperature, energy intensity, etc. This second approach is limited by low availability of data describing these building-specific factors (thus making it very costly).
A third (post-hoc) approach has recently emerged that takes a population-level sample of similar buildings and compares with a treatment group of buildings that have received an energy efficiency upgrade (or other intervention). This approach assumes that all buildings will be affected equally by exogenous factors, leaving only endogenous factors (i.e., the efficiency upgrade) to account for the energy consumption difference.
In the analog era of traditional meters and monthly bills, efforts to improve energy efficiency emphasized fairly static and permanent changes in consumption. A whole-home retrofit, for example, would reduce energy demand without requiring any additional behavioral or lifestyle changes. A one-time intervention would provide years of benefit, and our metering technology at the time provided a way to measure the performance of these measures.
With the introduction of smart meters, utilities have transitioned from simple efficiency programs to a suite of programs under the umbrella of demand side management (DSM). These new measures fall into three broad categories including time of day, demand, and net metering. The OpenEEmeter expands the programmatic interface of energy efficiency to engage with emergent technologies and market based demand side engagement programs.
3) How the OpenEEmeter is valuable: Baselining, Normalization, and Modeling Energy Use¶
Smart meter data allows for more complexity in statistical models. Rather than relying on simple regression experiments to normalize energy consumption, analysts can parse the impact of exogenous and endogenous factors independently and iteratively. The notion of baseload energy use can even be disaggregated into multiple demand states. For example, a home will use very little energy when empty, a bit more when occupied, and a large amount when appliances and heating or cooling systems are operating. These demand states can be measured against various sorts of interventions, thus enabling both traditional energy efficiency savings measurements, but also leveraging modern load balancing tools.
The OpenEEmeter calculates energy savings in real time by selecting a sample of consumption data prior to an intervention, weather-normalizing it to establish a baseline, and calculating the difference between projected energy usage and actual energy usage following the intervention. This method maintains the cost-effectiveness of the naive predicted savings approach, the real-world integrity of the building efficiency approach, without sacrificing on time as with the post hoc control group approach.
Architecture Overview¶
The complete eemeter architecture consists primarily of a datastore application (see datastore), which houses energy and project data, and a data pipeline toolkit (see ETL Toolkit) that helps get data into the datastore.
These two work in tandem to take raw energy data in whatever form it exists and compute energy savings using the eemeter package. The methods and models used within the datastore for computing energy savings are kept in a library package called eemeter, which can also be used independent of the datastore application (see eemeter).
Each of these components are open sourced under an MIT License and can be found on github:
The core calculation engine is separated from the datastore in order to allow easier development of and evaluation of its methods, but this architecture also makes it possible to embed the calculation engine or any of its useful modules (such as the weather module) in other applications.
The data structures in each - the eemeter and the datastore - mirror each other. This simplifies data transfer and eases interpretation of results.
Methods Overview¶
The EEmeter provides multiple methods for calculating energy savings. All of these methods compare energy demand from a modeled counterfactual pre-intervention baseline to post-intervention energy demand. Some of these methods, including the most conventional, weather normalize energy demand.
These basic methods [1] rely on a modeled relationship between weather patterns and energy demand. The particular models used by the EEmeter are described more precisely in Modeling Overview.
Modeling periods¶
For any savings calculation, the period of time prior to the start of any interventions taking place as part of a project we term the baseline period. This period is used to establish models of the relationship between energy demand and a set of factors that represent or contribute to end use demand (such as weather, time of day, or day of week) for a particular building prior to an intervention. The baseline becomes a reference point from which to make comparisions to post-intervention energy performance. The baseline period is one of two types of modeling period frequently occurring in the EEmeter.
The second half of the savings calculation concerns what happens after an intervention. Any post-intervention period for which energy savings is calculated is called a reporting period because it is the period of time over which energy savings is reported. A project generally has only one baseline period, but it might have multiple reporting periods. These are the second type of modeling period to frequent occur in the EEmeter.
The extent of these periods will, in most cases, be determined by the start and end dates of the interventions in a project. However, in some cases, the intervention dates are not known, or are ongoing, and must be modeled because they cannot be stated explicitly. We refer to models which account for the latter scenario as structural change models; these are covered in greater detail in Modeling Overview.
EEmeter structures which capture this logic can be found in the API documentation for eemeter.structures.

Pre-intervention baseline period and post-intervention reporting periods on a project timeline.
Trace modeling¶
The relationship between energy demand and various external factors can differ drastically from building to building, and (usually!) changes after an intervention. Modeling these relationships properly with statistical confidence is a core strength of the EEmeter.
As noted in the background, we term a set of energy data points a trace, and a building or project might be associated with any number of traces. In order to calculate savings models, each of these traces must be modeled.
Before modeling, traces are segmented into components which overlap each baseline and reporting period of interest, then are modeled separately. [2] This creates up to \(n * m\) models for a project with \(n\) traces and \(m\) modeling periods.
Each of these models attempts to establish the relationship between energy demand and external factors as it performed during the particular modeling period of interest. However, since the extent to which a model successfully describes these relationships varies significantly, these must be considered only in conjunction with model error and goodness of fit metrics Modeling Overview. Any estimate of energy demand given by any model fitted by the EEmeter is associated with variance and confidence bounds.
In practice the number of models fitted for any particular project might be fewer than \(n * m\) due to missing or insufficient data (see Data sufficiency). The EEmeter takes these failures into account and considers them when building summaries of savings.

An example of trace segmenting with two traces, one baseline period and one reporting period. Trace 1 is segmented into just one component - the baseline component - because data for the reporting period is missing. Trace 2 is segmented into one baseline component and one reporting component. The segments of Trace 1 and Trace 2 have different lengths, but models of their energy demand behavior can still be built.
Weather normalization¶
Once we have created a model, we can apply that model to determine an estimate of energy demand during arbitrary weather scenarios. The two most common weather scenarios for which the EEmeter will estimate demand are the “normal” weather year and the observed reporting period weather year. This is generally necessary because the data observed in the baseline and reporting periods occurred during different time periods with different weather – and valid comparisons between them must account for this. Estimating energy performance during the “normal” weather attempts to reduce bias in the savings estimate by accounting for the peculiarity (as compared to other years or seasons) of the relevant observed weather.
In an attempt to reduce the number of arbitrary factors influencing results, we only ever compare model estimates or data over that has occurred over the same weather scenario and time period. This helps (in the aggregate) to ensure equivalency of end use demand pre- and post-intervention.
Savings¶
If the data and models show that energy demand is reduced relative to equivalent end use demand following an intervention, we say that there have been energy savings, or equivalently, that energy performance has increased.
Energy savings is necessarily a difference; however, this difference must be taken carefully, given missing data and model error, and is only taken after the necessary aggregation steps.
The equation for savings is always:
\(S_\text{total} = E_\text{b} - E_\text{r}\)
or
\(S_\text{percent} = \frac{E_\text{b} - E_\text{r}}{E_\text{b}}\)
where
- \(S_\text{total}\) is aggregate total savings
- \(S_\text{percent}\) is aggregate percent savings
- \(E_\text{b}\) is aggregate energy demand as under baseline period conditions
- \(E_\text{r}\) is aggregate energy demand as under reporting period conditions
Depending on the type of energy savings desired, the values \(E_\text{b}\) and \(E_\text{r}\) may be calculated differently. The following types of savings are supported:
Annualized weather normal¶
The annualized weather normal estimates savings as it may have occurred during a “normal” weather year. It does this by building models of both the baseline and reporting energy demand and using each to weather-normalize the energy values.
\(E_\text{b} = \text{M}_\text{b}\left(\text{X}_\text{normal}\right)\)
\(E_\text{r} = \text{M}_\text{r}\left(\text{X}_\text{normal}\right)\)
where
- \(\text{M}_\text{b}\) is the model of energy demand as built using trace data segmented from the baseline period.
- \(\text{M}_\text{r}\) is the model of energy demand as built using trace data segmented from the reporting period.
- \(\text{X}_\text{normal}\) are temperature and other covariate values for the weather normal year.
Gross predicted¶
The gross predicted method estimates savings that have occurred from the completion of the project interventions up to the date of the meter run.
\(E_\text{b} = \text{M}_\text{b}\left(\text{X}_\text{r}\right)\)
\(E_\text{r} = \text{M}_\text{r}\left(\text{X}_\text{r}\right)\)
where
- \(\text{M}_\text{b}\) is the model of energy demand as built using trace data segmented from the baseline period.
- \(\text{M}_\text{r}\) is the model of energy demand as built using trace data segmented from the reporting period.
- \(\text{X}_\text{r}\) are temperature and other covariate values for reporting period.
Gross observed¶
The gross observed method estimates savings that have occurred from the completion of the project interventions up to the date of the meter run.
\(E_\text{b} = \text{M}_\text{b}\left(\text{X}_\text{r}\right)\)
\(E_\text{r} = \text{A}_\text{r}\)
where
- \(\text{M}_\text{b}\) is the model of energy demand as built using trace data segmented from the baseline period.
- \(\text{A}_\text{r}\) are the actual observed energy demand values from the trace data segmented from the baseline period. If the actual data has missing values, these are interpolated using gross predicted values (i.e., \(\text{M}_\text{r}\left(\text{X}_\text{r}\right)\)).
- \(\text{X}_\text{r}\) are temperature and other covariate values for reporting period.
Aggregation rules¶
Because even an individual project may have multiple traces describing its energy demand, we must be able to aggregate trace-level results before we can obtain project-level or portfolio-level savings. Ideally, this aggregation is a simple sum of trace-level values. However, trace-level results are often littered with messy results which must be accounted for; some may be missing data, have bad model fits, or have entirely failed model builds. The EEmeter must successfully handle each of these cases, or risk invalidating results for entire portfolios.
The aggregation steps are as follows:
Select scope (project, portfolio) and gather all trace data available in that scope
Select baseline and reporting period. For portfolio level aggregations in which baseline and reporting periods may not align, select reporting period type and use the default baseline period for each project.
Group traces by interpretation
Compute \(E_\text{b}\) and \(E_\text{r}\):
- Compute (or retrieve) \(E_\text{t,b}\) and \(E_\text{t,r}\) for each trace \(\text{t}\).
- Determine, for each \(E_\text{t,b}\) and \(E_\text{t,r}\) whether or not it meets criteria for inclusion in aggregation.
- Discard both \(E_\text{t,b}\) and \(E_\text{t,r}\) for any trace for which either \(E_\text{t,b}\) or \(E_\text{t,r}\) has been discarded.
- Compute \(E_\text{b} = \sum_{\text{t}}E_\text{t,b}\) and \(E_\text{r} = \sum_{\text{t}}E_\text{t,r}\) for remaining traces. Errors are propagated according to the principles in Error propagation.
Compute savings from \(E_\text{b}\) and \(E_\text{r}\) as usual.
Inclusion criteria¶
For inclusion in aggregates, \(E_\text{t,b}\) and \(E_\text{t,r}\) must meet the following criteria
- If
ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED
, which represents solar generation, is available, and if solar panels were installed as one of the project interventions, blank \(E_\text{t,b}\) should be replaced with 0. - Model has been successfully built.
Error propagation¶
Errors are propagated as if they followed \(\chi^2\) distributions.
Weather data matching¶
Since weather and temperature data is so central to the activity of the EEmeter, the particulars of how weather data is obtained for a project is often of interest. Weather data sources are determined automatically within the EEmeter using an internal mapping [3] between ZIP codes [4] and weather stations. The source of the weather normal data may differ from the source of the observed weather data.
There is a jupyter notebook outlining the process of constructing the weather data available here.
[1] | Additional information on why this method is used in preference to other methods is described in the Introduction. |
[2] | This is not quite true for structural change models. This is covered in more detail in Modeling Overview. |
[3] | Available on github. |
[4] | The ZIP codes used in this mapping aren’t strictly ZIP codes, they’re actually ZCTAs. |
Glossary¶
- annualized weather normal: an estimate of annual energy demand under a weather normal.
- baseline: a pre-intervention reference point or starting point from which to compare post-intervention energy demand.
- baseline period: a time period before a retrofit of interest for which to model, observe, or estimate energy performance. Generally used in reference to a reporting period or set of reporting periods.
- building performance: see energy performance.
- demand capacity: the extent to which energy-performance increases from a baseline for a reporting period following an intervention.
- demand response project: a set of interventions designed to shift the time of day or day of week of energy-demand, generally toward off-peak hours.
- end use: an energy-consuming service such as lighting, space cooling, space heating, refrigeration, or water heating, particularly as provided by a building or set of buildings.
- end use demand: the extent to which an end use is needed. May vary by season, occupancy, time of day, day of week, or purpose of building.
- energy demand: the amount of energy needed to satisfy end use demand.
- energy efficiency project: a set of interventions designed to reduce overall energy demand relative to equivalent end use demand.
- energy model: a mathematical description of energy demand, particularly in response to end use demand scenarios.
- energy savings: an increase in energy performance indicating lower energy demand for equivalent end use demand.
- energy performance: the extent to which end use demand causes energy demand. Higher performance indicates lower energy demand for equivalent end use demand. Sometimes referred to as building performance.
- energy trace: see trace
- gross observed: an estimate of energy demand over the reporting period as given by baseline models and observed values from the reporting period.
- gross predicted: an estimate of energy demand as given by the baseline and reporting models evaluated over the reporting period.
- intervention: a set of upgrades or performance improvements on physical infrastructure of an existing building (see retrofit), or of behavior of individuals living in an existing building.
- modeling period: a period of time over which an energy model is to be created for a particular trace. This is a generalization of baseline and reporting periods. Modeling periods generally fall into one of those two categories.
- projected baseline energy demand: a counterfactual estimate of energy demand as it might have been under a particular end use demand scenario had an intervention not occurred.
- project: an intervention or retrofit for which there is an expected change in energy demand.
- reporting period: a time period after a retrofit of interest over which to model, observe, or estimate energy performance. Generally used in reference to a baseline period.
- retrofit: a set of interventions taking place at a particular building or site which modify pre-existing structures, installations or appliances.
- structural change model: a model which takes tries to determine the most probably extents of baseline and reporting periods for a project given its trace data.
- trace: a single time series of measured values associated with units at a particular (not necessarily fixed) frequency.
- trace interpretation: the meaning of the trace data. Possible interpretations are outlined in eemeter.structures
- Typical Meteorological Year 3 (TMY3): A set of publicly available weather normals designed by the National Renewable Energy Laboratory (NREL). Used by EEMeter for weather normalization.
- weather normalization: a technique to account for differences in end use demand due to variations in weather patterns which uses a model of weather-dependent energy demand to determine a counterfactual energy demand under a weather conditions described by a weather normal.
- weather normal: a set of (not necessarily observed) weather data designed to reflect a “typical” weather scenario. Often covers a time period of 1 year. Used in weather normalization. See TMY3.
- ZIP Code Tabulation Area (ZCTA): a set of geographical areas based on US Postal Service (USPS) ZIP codes, necessitated by the fact that ZIP codes do not map easily onto geographies. Built and maintained by the US Census Bureau. Contains only about three quarters of valid ZIP codes. ZIP code and ZCTA do not always match. More information.
Why open source?¶
All of our savings algorithms are free and open source. We don’t believe that standard weights and measures should be the private property of any particular entity. It’s much better for everyone, from contractors to program administrators, if the measurement tools are equally available to everyone.
eemeter¶
Installation¶
Note
If you are installing python for the first time, we recommend using Anaconda, a free python distribution with builds for windows, mac os, and linux.
To get started with the eemeter, use pip:
$ pip install eemeter
Make sure you have the latest version:
>>> import eemeter; eemeter.get_version()
'1.3.3'
The eemeter package itself does not use C extensions. However, some eemeter dependencies do. These can be a bit trickier to install. If issues arise when pip installing eemeter, verify that the packges with C extensions are properly installing. Specifically, verify that these installation commands complete without errors:
pip install lxml
pip install numpy
If they fail, please see follow installation instructions for those packages (lxml, numpy).
Some statsmodels installations require numpy to be installed. If you run into errors with the statsmodels installation, be sure numpy is installed before attempting to install statsmodels. Once statsmodels is installed correctly, install eemeter.
Topics¶
Basic Usage: eemeter package¶
This tutorial is also available as a jupyter notebook
Note:
Most users of the EEmeter stack do not directly use the eemeter
package for loading their data. Instead, they use the datastore
application, which uses the eemeter internally. To learn to use the
datastore, head over to the datastore basic usage tutorial.
Running a meter¶
Please download a preformatted input file
.
We can load this input file into memory with the following:
In [1]:
import json
with open('meter_input_example.json', 'r') as f: # modify to point to your downloaded input file.
meter_input = json.load(f)
The file has a single trace of hourly electricity consumption data and some associated project data. Its contents looks like this:
In [2]:
!head -15 meter_input_example.json
{
"type": "SINGLE_TRACE_SIMPLE_PROJECT",
"trace": {
"type": "ARBITRARY_START",
"interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
"unit": "KWH",
"trace_id": "TRACE_ID_123",
"interval": "daily",
"records": [
{
"start": "2011-01-01T00:00:00+00:00",
"value": 57.8,
"estimated": false
},
{
In [3]:
!tail -25 meter_input_example.json
"estimated": false
},
{
"start": "2015-01-01T00:00:00+00:00",
"value": null,
"estimated": false
}
]
},
"project": {
"type": "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP",
"zipcode": "50321",
"project_id": "PROJECT_ID_ABC",
"modeling_period_group": {
"baseline_period": {
"start": null,
"end": "2013-06-01T00:00:00+00:00"
},
"reporting_period": {
"start": "2013-07-01T00:00:00+00:00",
"end": null
}
}
}
}
Next, we can create a meter, model and formatter. These work in tandem to create a model of energy usage.
The meter
coordinates loading the input data, matching it with
appropriate weather data, and passing it to the formatter and model. It
then uses these to calculate a set of outputs, including energy savings
estimates such as annualized weather normalized usage.
The formatter
formats the trace and project data for use within the
model.
The model
fits a model of energy usage to this formatted data which
can be used, given covariate weather data, to predict or model energy
usage over an arbitrary period of time.
In [4]:
from eemeter.ee.meter import EnergyEfficiencyMeter
from eemeter.modeling.models import CaltrackMonthlyModel
from eemeter.modeling.formatters import ModelDataFormatter
meter = EnergyEfficiencyMeter()
model = (CaltrackMonthlyModel, {"fit_cdd": False, "grid_search": True})
formatter = (ModelDataFormatter, {"freq_str": "D"})
The meter we created is an instance of the EEmeter class which operates on single energy traces.
The model we created is a tuple of (model class, model keyword arguments), not an instantiation of the model. We do it this way to allow easy creation of multiple instances of the model class.
The formatter is, like the model, a tuple of (formatter class, formatter keyword arguments), for the same reason - we want to make multiple instances of the formatter class.
These can be used directly to “evaluate” the meter on the meter input.
We’ll store the output in meter_output
.
In [5]:
meter_output = meter.evaluate(meter_input, model=model, formatter=formatter)
This meter_ouput
is quite verbose, so we’ll export it to a json file
which is a bit more readable.
In [6]:
with open('meter_output_example.json', 'w') as f: # change this path if desired.
json.dump(meter_output, f, indent=2)
The content of this file will look something like this:
In [7]:
!head -40 meter_output_example.json
{
"status": "SUCCESS",
"failure_message": null,
"logs": [
"Using weather_source ISDWeatherSource(\"725460\")",
"Using weather_normal_source TMY3WeatherSource(\"725460\")"
],
"eemeter_version": "0.5.3",
"model_class": "CaltrackMonthlyModel",
"model_kwargs": {
"fit_cdd": false,
"grid_search": true
},
"formatter_class": "ModelDataFormatter",
"formatter_kwargs": {
"freq_str": "D"
},
"weather_source_station": "725460",
"weather_normal_source_station": "725460",
"derivatives": [
{
"modeling_period_group": [
"baseline",
"reporting"
],
"series": "Cumulative baseline model minus reporting model, normal year",
"description": "Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed.",
"orderable": [
null
],
"value": [
2479.015638036155
],
"variance": [
7354.084609086982
]
},
{
"modeling_period_group": [
"baseline",
Note how this file is organized: it contains a summary of the operations done during meter execution, including everything necessary to recreate the meter run, like the model class and keyword arguments used to initialize it, and the weather data (degrees F, called “demand_fixture”) that was used in model building.
Not everyone has data ready to go, so if you are in that bucket, the next section covers how you can get started with data of your own.
Data preparation¶
All we’ll be doing in this section is creating a data structure that has
the same format as meter_input_example.json
file above. We are using
the eemeter EnergyTrace helper structure.
Of course, this is not the only way to get data into the necessary format; use this for inspiration, but make changes as necessary to accomodate the particulars of your dataset.
In [8]:
# library imports
from eemeter.structures import EnergyTrace
from eemeter.io.serializers import ArbitraryStartSerializer
from eemeter.ee.meter import EnergyEfficiencyMeter
import pandas as pd
import pytz
First, we import the energy data from the sample CSV and transform it into records
In [9]:
energy_data = pd.read_csv('sample-energy-data_project-ABC_zipcode-50321.csv',
parse_dates=['date'], dtype={'zipcode': str})
records = [{
"start": pytz.UTC.localize(row.date.to_datetime()),
"value": row.value,
"estimated": row.estimated,
} for _, row in energy_data.iterrows()]
The records we created look like this:
In [10]:
records[:3] # the first three records
Out[10]:
[{'estimated': False,
'start': datetime.datetime(2011, 1, 1, 0, 0, tzinfo=<UTC>),
'value': 57.8},
{'estimated': False,
'start': datetime.datetime(2011, 1, 2, 0, 0, tzinfo=<UTC>),
'value': 64.8},
{'estimated': False,
'start': datetime.datetime(2011, 1, 3, 0, 0, tzinfo=<UTC>),
'value': 49.5}]
Next, we load our records into an EnergyTrace
. We give it units
"KWH"
and interpretation "ELECTRICITY_CONSUMPTION_SUPPLIED"
,
which means that this is electricity consumed by the building and
supplied by a utility (rather than by solar panels or other on-site
generation). We also pass in an instance of the record serializer
ArbitraryStartSerializer
to show it how to interpret the records.
In [11]:
energy_trace = EnergyTrace(
records=records,
unit="KWH",
interpretation="ELECTRICITY_CONSUMPTION_SUPPLIED",
serializer=ArbitraryStartSerializer(),
trace_id='TRACE_ID_123',
interval='daily'
)
The energy trace data we created looks like this:
In [12]:
energy_trace.data[:3] # first three records
Out[12]:
value | estimated | |
---|---|---|
2011-01-01 00:00:00+00:00 | 57.8 | False |
2011-01-02 00:00:00+00:00 | 64.8 | False |
2011-01-03 00:00:00+00:00 | 49.5 | False |
Now we load the rest of the project data from the sample project data CSV. This CSV includes the project_id (we don’t use it in this tutorial, but this is how you might identify the saved meter results), the ZIP code of the building, and the dates retrofit work for this project started and completed.
In [13]:
project_data = pd.read_csv('sample-project-data.csv',
parse_dates=['retrofit_start_date', 'retrofit_end_date']).iloc[0]
Here’s what our project data looks like.
In [14]:
project_data
Out[14]:
project_id ABC
zipcode 50321
retrofit_start_date 2013-06-01 00:00:00
retrofit_end_date 2013-07-01 00:00:00
Name: 0, dtype: object
In [15]:
zipcode = "{:05d}".format(project_data.zipcode)
retrofit_start_date = pytz.UTC.localize(project_data.retrofit_start_date)
retrofit_end_date = pytz.UTC.localize(project_data.retrofit_end_date)
Here’s an example of how to get this data into the format the meter expects (exactly the format of the meter_input_example.json from above).
In [16]:
from collections import OrderedDict
def serialize_meter_input(trace, zipcode, retrofit_start_date, retrofit_end_date):
data = OrderedDict([
("type", "SINGLE_TRACE_SIMPLE_PROJECT"),
("trace", trace_serializer(trace)),
("project", project_serializer(zipcode, retrofit_start_date, retrofit_end_date)),
])
return data
def trace_serializer(trace):
data = OrderedDict([
("type", "ARBITRARY_START"),
("interpretation", trace.interpretation),
("unit", trace.unit),
("trace_id", trace.trace_id),
("interval", trace.interval),
("records", [
OrderedDict([
("start", start.isoformat()),
("value", record.value if pd.notnull(record.value) else None),
("estimated", bool(record.estimated)),
])
for start, record in trace.data.iterrows()
]),
])
return data
def project_serializer(zipcode, retrofit_start_date, retrofit_end_date):
data = OrderedDict([
("type", "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP"),
("zipcode", zipcode),
("project_id", 'PROJECT_ID_ABC'),
("modeling_period_group", OrderedDict([
("baseline_period", OrderedDict([
("start", None),
("end", retrofit_start_date.isoformat()),
])),
("reporting_period", OrderedDict([
("start", retrofit_end_date.isoformat()),
("end", None),
]))
]))
])
return data
In [17]:
my_meter_input = serialize_meter_input(
energy_trace, zipcode, retrofit_start_date, retrofit_end_date)
In [18]:
with open('my_meter_input.json', 'w') as f:
json.dump(my_meter_input, f, indent=2)
In [19]:
!head -15 my_meter_input.json
{
"type": "SINGLE_TRACE_SIMPLE_PROJECT",
"trace": {
"type": "ARBITRARY_START",
"interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
"unit": "KWH",
"trace_id": "TRACE_ID_123",
"interval": "daily",
"records": [
{
"start": "2011-01-01T00:00:00+00:00",
"value": 57.8,
"estimated": false
},
{
In [20]:
!tail -25 my_meter_input.json
"estimated": false
},
{
"start": "2015-01-01T00:00:00+00:00",
"value": null,
"estimated": false
}
]
},
"project": {
"type": "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP",
"zipcode": "50321",
"project_id": "PROJECT_ID_ABC",
"modeling_period_group": {
"baseline_period": {
"start": null,
"end": "2013-06-01T00:00:00+00:00"
},
"reporting_period": {
"start": "2013-07-01T00:00:00+00:00",
"end": null
}
}
}
}
Now we can run this through the meter exactly the same way we did before:
In [21]:
my_meter_output = meter.evaluate(my_meter_input, model=model, formatter=formatter)
Inspecting results¶
Now that we have some results at our fingertips, let’s inspect them. We’ll be using the meter output from the first example trace.
The output is mostly made up of a set of “derivatives”. These aren’t derivatives in the calculus sense - they’re just derived from the model output.
Let’s take a look at the first one.
In [22]:
derivative = meter_output["derivatives"][0]
We can take a peek at the contents by looking at the keys of the dict.
In [23]:
[k for k in derivative.keys()]
Out[23]:
['modeling_period_group',
'series',
'description',
'orderable',
'value',
'variance']
Each derivative is a series with a name and a description
In [24]:
derivative['series'], derivative['description']
Out[24]:
('Cumulative baseline model minus reporting model, normal year',
'Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed.')
The values associated with the derivative are stored in value, their variances are stored in variance, and the orderables act as keys. A single orderable of None indicates (as in this case) that the value and variance are singleton values.
In [25]:
derivative['orderable'], derivative['value'], derivative['variance']
Out[25]:
([None], [2479.015638036155], [7354.0846090869818])
Other derivatives are computed as well:
In [26]:
print(json.dumps([(d['series'], d['description']) for d in sorted(meter_output["derivatives"], key=lambda o: o['series'])], indent=2))
[
[
"Baseline model minus observed, reporting period",
"Predicted usage according to the baseline model minus observed usage over the reporting period."
],
[
"Baseline model minus reporting model, normal year",
"Predicted usage according to the baseline model over the normal weather year, minus the predicted usage according to the reporting model over the normal weather year."
],
[
"Baseline model, baseline period",
"Predicted usage according to the baseline model over the baseline period."
],
[
"Baseline model, normal year",
"Predicted usage according to the baseline model over the normal weather year."
],
[
"Baseline model, reporting period",
"Predicted usage according to the baseline model over the reporting period."
],
[
"Cumulative baseline model minus observed, reporting period",
"Total predicted usage according to the baseline model minus observed usage over the reporting period. Days for which reporting period weather data or usage do not exist are removed."
],
[
"Cumulative baseline model minus reporting model, normal year",
"Total predicted usage according to the baseline model over the normal weather year, minus the total predicted usage according to the reporting model over the normal weather year. Days for which normal year weather data does not exist are removed."
],
[
"Cumulative baseline model, normal year",
"Total predicted usage according to the baseline model over the normal weather year. Days for which normal year weather data does not exist are removed."
],
[
"Cumulative baseline model, reporting period",
"Total predicted usage according to the baseline model over the reporting period. Days for which reporting period weather data does not exist are removed."
],
[
"Cumulative observed, baseline period",
"Total observed usage over the baseline period. Days for which weather data does not exist are NOT removed."
],
[
"Cumulative observed, reporting period",
"Total observed usage over the reporting period. Days for which weather data does not exist are NOT removed."
],
[
"Cumulative reporting model, normal year",
"Total predicted usage according to the reporting model over the reporting period. Days for which normal year weather data does not exist are removed."
],
[
"Inclusion mask, baseline period",
"Mask for baseline period data which is included in model and savings cumulatives."
],
[
"Inclusion mask, reporting period",
"Mask for reporting period data which is included in model and savings cumulatives."
],
[
"Observed, baseline period",
"Observed usage over the baseline period."
],
[
"Observed, project period",
"Observed usage over the project period."
],
[
"Observed, reporting period",
"Observed usage over the reporting period."
],
[
"Reporting model, normal year",
"Predicted usage according to the reporting model over the reporting period."
],
[
"Reporting model, reporting period",
"Predicted usage according to the reporting model over the reporting period."
],
[
"Temperature, baseline period",
"Observed temperature (degF) over the baseline period."
],
[
"Temperature, normal year",
"Observed temperature (degF) over the normal year."
],
[
"Temperature, reporting period",
"Observed temperature (degF) over the reporting period."
]
]
Weather Data Caching¶
In order to avoid putting an unnecessary load on external weather
sources, weather data is cached by default using json in a directory
~/.eemeter/cache
. The location of the directory can be changed by
setting:
$ export EEMETER_WEATHER_CACHE_DIRECTORY=<full path to directory>
API¶
eemeter.ee¶
eemeter.ee.meter¶
-
class
eemeter.ee.meter.
EnergyEfficiencyMeter
(**kwargs)[source]¶ Meter for determining energy efficiency derivatives for a single traces.
Parameters: default_model_mapping (dict) – mapping between (interpretation, frequency) tuples used to select the default model (if none is explicitly provided in .evaluate()). -
evaluate
(meter_input, formatter=None, model=None, weather_source=None, weather_normal_source=None)[source]¶ Main entry point to the meter, which models traces and calculates derivatives.
Parameters: - meter_input (dict) – Serialized input containing trace and project data.
- formatter (tuple of (class, dict), default None) – Formatter for trace and weather data. Used to create input for model. If None is provided, will be auto-matched to appropriate default formatter. Class name can be provided as a string (class.__name__) or object.
- model (tuple of (class, dict), default None) – Model to use in modeling. If None is provided, will be auto-matched to appropriate default model. Class can be provided as a string (class.__name__) or class object.
- weather_source (eemeter.weather.WeatherSource) – Weather source to be used for this meter. Overrides weather source
found using
project.site
. Useful for test mocking. - weather_normal_source (eemeter.weather.WeatherSource) – Weather normal source to be used for this meter. Overrides weather
source found using
project.site
. Useful for test mocking.
Returns: results – Dictionary of results with the following keys:
"status"
: SUCCESS/FAILURE"failure_message"
: if FAILURE, message indicates reason for failure, may include traceback"logs"
: list of collected log messages"model_class"
: Name of model class"model_kwargs"
: dict of model keyword arguments (settings)"formatter_class"
: Name of formatter class"formatter_kwargs"
: dict of formatter keyword arguments (settings)"eemeter_version"
: version of the eemeter package"modeled_energy_trace"
: modeled energy trace"derivatives"
: derivatives for each interpretation"weather_source_station"
: Matched weather source station."weather_normal_source_station"
: Matched weather normal source station.
Return type: dict
-
eemeter.io¶
eemeter.io.serializers¶
-
class
eemeter.io.serializers.
ArbitrarySerializer
(parse_dates=False)[source]¶ Arbitrary data at arbitrary non-overlapping intervals. Often used for montly billing data. Records must all have the “start” key and the “end” key. Overlaps are not allowed and gaps will be filled with NaN.
For example:
>>> records = [ ... { ... "start": datetime(2013, 12, 30, tzinfo=pytz.utc), ... "end": datetime(2014, 1, 28, tzinfo=pytz.utc), ... "value": 1180, ... }, ... { ... "start": datetime(2014, 1, 28, tzinfo=pytz.utc), ... "end": datetime(2014, 2, 27, tzinfo=pytz.utc), ... "value": 1211, ... "estimated": True, ... }, ... { ... "start": datetime(2014, 2, 28, tzinfo=pytz.utc), ... "end": datetime(2014, 3, 30, tzinfo=pytz.utc), ... "value": 985, ... }, ... ] ... >>> serializer = ArbitrarySerializer() >>> df = serializer.to_dataframe(records) >>> df value estimated 2013-12-30 00:00:00+00:00 1180.0 False 2014-01-28 00:00:00+00:00 1211.0 True 2014-02-27 00:00:00+00:00 NaN False 2014-02-28 00:00:00+00:00 985.0 False 2014-03-30 00:00:00+00:00 NaN False
-
class
eemeter.io.serializers.
ArbitraryStartSerializer
(parse_dates=False)[source]¶ Arbitrary start data at arbitrary non-overlapping intervals. Records must all have the “start” key. The last data point will be ignored unless an end date is provided for it. This is useful for data dated to future energy use, e.g. billing for delivered fuels.
For example:
>>> records = [ ... { ... "start": datetime(2013, 12, 30, tzinfo=pytz.utc), ... "value": 1180, ... }, ... { ... "start": datetime(2014, 1, 28, tzinfo=pytz.utc), ... "value": 1211, ... "estimated": True, ... }, ... { ... "start": datetime(2014, 2, 28, tzinfo=pytz.utc), ... "value": 985, ... }, ... ] ... >>> serializer = ArbitrarySerializer() >>> df = serializer.to_dataframe(records) >>> df value estimated 2013-12-30 00:00:00+00:00 1180.0 False 2014-01-28 00:00:00+00:00 1211.0 True 2014-02-28 00:00:00+00:00 NaN False
-
class
eemeter.io.serializers.
ArbitraryEndSerializer
(parse_dates=False)[source]¶ Arbitrary end data at arbitrary non-overlapping intervals. Records must all have the “end” key. The first data point will be ignored unless a start date is provided for it. This is useful for data dated to past energy use, e.g. electricity or natural gas bills.
For example:
>>> records = [ ... { ... "end": datetime(2013, 12, 30, tzinfo=pytz.utc), ... "value": 1180, ... }, ... { ... "end": datetime(2014, 1, 28, tzinfo=pytz.utc), ... "value": 1211, ... "estimated": True, ... }, ... { ... "end": datetime(2014, 2, 28, tzinfo=pytz.utc), ... "value": 985, ... }, ... ] ... >>> serializer = ArbitrarySerializer() >>> df = serializer.to_dataframe(records) >>> df value estimated 2013-12-30 00:00:00+00:00 1211.0 True 2014-01-28 00:00:00+00:00 985.0 False 2014-02-28 00:00:00+00:00 NaN False
eemeter.io.parsers¶
-
class
eemeter.io.parsers.
ESPIUsageParser
(xml)[source]¶ Parse ESPI XML files.
Basic usage:
>>> from eemeter.io.parsers import ESPIUsageParser >>> with open("/path/to/example.xml") as f: ... parser = ESPIUsageParser(f) >>> energy_traces = list(parser.get_energy_traces())
Parameters: xml (str, filepath, file buffer) – XML data to parse -
get_energy_traces
(service_kind_default='electricity')[source]¶ Retrieve all energy trace records stored as IntervalReading elements in the given ESPI Energy Usage XML.
Energy records are grouped by interpretation and returned in EnergyTrace objects.
Parameters: service_kind_default (str) – Default fuel type to use in parser if ReadingType/commodity field is missing. Yields: energy_trace (eemeter.structures.EnergyTrace) – Energy data traces as described in the xml file.
-
eemeter.modeling¶
eemeter.modeling.formatters¶
The formatter classes are designed to provide a standard interface to model fit and predict methods. The formatters add weather data to daily or monthly energy data. The interface assumes that the model class will be responsible for applying data sufficiency rules and additional formatting necessary for performing model fits or predictions.
-
class
eemeter.modeling.formatters.
ModelDataFormatter
(freq_str)[source]¶ Formatter for model data of known or predictable frequency. Basic usage:
>>> formatter = ModelDataFormatter("D") >>> formatter.create_input(energy_trace, weather_source) energy tempF 2013-06-01 00:00:00+00:00 3.10 74.3 2013-06-02 00:00:00+00:00 2.42 71.0 2013-06-03 00:00:00+00:00 1.38 73.1 ... 2016-05-27 00:00:00+00:00 0.11 71.1 2016-05-28 00:00:00+00:00 0.04 78.1 2016-05-29 00:00:00+00:00 0.21 69.6 >>> index = pd.date_range('2013-01-01', periods=365, freq='D') >>> formatter.create_input(index, weather_source) tempF 2013-01-01 00:00:00+00:00 28.3 2013-01-02 00:00:00+00:00 31.0 2013-01-03 00:00:00+00:00 34.1 ... 2013-12-29 00:00:00+00:00 12.3 2013-12-30 00:00:00+00:00 26.0 2013-12-31 00:00:00+00:00 24.1
-
create_demand_fixture
(index, weather_source)[source]¶ Creates a
DatetimeIndex
ed dataframe containing formatted demand fixture data.Parameters: - index (pandas.DatetimeIndex) – The desired index for demand fixture data.
- weather_source (eemeter.weather.WeatherSourceBase) – The source of weather fixture data.
Returns: input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.predict() methods.
Return type: pandas.DataFrame
-
create_input
(trace, weather_source)[source]¶ Creates a
DatetimeIndex
ed dataframe containing formatted model input data formatted as follows.Parameters: - trace (eemeter.structures.EnergyTrace) – The source of energy data for inclusion in model input.
- weather_source (eemeter.weather.WeatherSourceBase) – The source of weather data.
Returns: input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.fit() methods.
Return type: pandas.DataFrame
-
-
class
eemeter.modeling.formatters.
ModelDataBillingFormatter
[source]¶ Formatter for model data of unknown or unpredictable frequency. Basic usage:
>>> formatter = ModelDataBillingFormatter() >>> energy_trace = EnergyTrace( "ELECTRICITY_CONSUMPTION_SUPPLIED", pd.DataFrame( { "value": [1, 1, 1, 1, np.nan], "estimated": [False, False, True, False, False] }, index=[ datetime(2011, 1, 1, tzinfo=pytz.UTC), datetime(2011, 2, 1, tzinfo=pytz.UTC), datetime(2011, 3, 2, tzinfo=pytz.UTC), datetime(2011, 4, 3, tzinfo=pytz.UTC), datetime(2011, 4, 29, tzinfo=pytz.UTC), ], columns=["value", "estimated"] ), unit="KWH") >>> trace_data, temp_data = formatter.create_input(energy_trace, weather_source) >>> trace_data 2011-01-01 00:00:00+00:00 1.0 2011-02-01 00:00:00+00:00 1.0 2011-03-02 00:00:00+00:00 2.0 2011-04-29 00:00:00+00:00 NaN dtype: float64 >>> temp_data period hourly 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00 32.0 2011-01-01 01:00:00+00:00 32.0 2011-01-01 02:00:00+00:00 32.0 ... ... 2011-03-02 00:00:00+00:00 2011-04-28 21:00:00+00:00 32.0 2011-04-28 22:00:00+00:00 32.0 2011-04-28 23:00:00+00:00 32.0 >>> index = pd.date_range('2013-01-01', periods=365, freq='D') >>> formatter.create_input(index, weather_source) tempF 2013-01-01 00:00:00+00:00 28.3 2013-01-02 00:00:00+00:00 31.0 2013-01-03 00:00:00+00:00 34.1 ... 2013-12-29 00:00:00+00:00 12.3 2013-12-30 00:00:00+00:00 26.0 2013-12-31 00:00:00+00:00 24.1
-
create_demand_fixture
(index, weather_source)[source]¶ Creates a
DatetimeIndex
ed dataframe containing formatted demand fixture data.Parameters: - index (pandas.DatetimeIndex) – The desired index for demand fixture data.
- weather_source (eemeter.weather.WeatherSourceBase) – The source of weather fixture data.
Returns: input_df – Predictably formatted input data. This data should be directly usable as input to applicable model.predict() methods.
Return type: pandas.DataFrame
-
create_input
(trace, weather_source)[source]¶ Creates two
DatetimeIndex
ed dataframes containing formatted model input data formatted as follows.Parameters: - trace (eemeter.structures.EnergyTrace) – The source of energy data for inclusion in model input.
- weather_source (eemeter.weather.WeatherSourceBase) – The source of weather data.
Returns: trace_data (pandas.DataFrame) – Predictably formatted trace data with estimated data removed. This data should be directly usable as input to applicable model.fit() methods.
temperature_data (pandas.DataFrame) – Predictably formatted temperature data with a pandas
MultiIndex
. TheMultiIndex
contains two levels - ‘period’, which corresponds directly to the trace_data index, and ‘hourly’ or ‘daily’, which contains, respectively, hourly or daily temperature data. This is intended for use like the following:>>> temperature_data.groupby(level='period')
This data should be directly usable as input to applicable model.fit() methods.
-
eemeter.modeling.models¶
-
class
eemeter.modeling.models.seasonal.
SeasonalElasticNetCVModel
(cooling_base_temp=65, heating_base_temp=65, n_bootstrap=100, modeling_period_interpretation='baseline')[source]¶ Linear regression using daily frequency data to build a model of formatted energy trace data that takes into account HDD, CDD, day of week, month, and holiday effects, with elastic net regularization.
Parameters: - cooling_base_temp (float) – Base temperature (degrees F) used in calculating cooling degree days.
- heating_base_temp (float) – Base temperature (degrees F) used in calculating heating degree days.
- n_bootstrap (int) – Number of points to exclude during bootstrap error estimation.
-
class
eemeter.modeling.models.billing.
BillingElasticNetCVModel
(cooling_base_temp=65, heating_base_temp=65, n_bootstrap=100, modeling_period_interpretation='baseline')[source]¶ Linear regression of energy values against CDD/HDD with elastic net regularization.
Parameters: - cooling_base_temp (float) – Base temperature (degrees F) used in calculating cooling degree days.
- heating_base_temp (float) – Base temperature (degrees F) used in calculating heating degree days.
- n_bootstrap (int) – Number of points to exclude during bootstrap error estimation.
-
class
eemeter.modeling.models.caltrack.
CaltrackMonthlyModel
(fit_cdd=True, grid_search=False, min_contiguous_baseline_months=12, min_contiguous_reporting_months=12, modeling_period_interpretation='baseline', weighted=False, **kwargs)[source]¶ This class implements the two-stage modeling routine agreed upon as part of the Caltrack beta test.
If fit_cdd is True, then all four candidate models (HDD+CDD, CDD-only, HDD-only, and Intercept-only) are used in stage 1 estimation. If it’s false, then only HDD-only and Intercept-only are used.
If grid_search is set to True, the balance point temperatures are determined by maximizing R^2 across the range 50-85 degF. Otherwise, 70 and 60 degF are used for cooling and heating, respectively.
Min_contiguous_months sets the number of contiguous months of data required at the beginning of the reporting period/end of the baseline period in order for the weather normalization to be valid.
-
billing_to_monthly_avg
(trace_and_temp)[source]¶ Helper function to handle monthly billing or other irregular data.
-
daily_to_monthly_avg
(df)[source]¶ Convert from daily usage and temperature to monthly usage per day and average HDD/CDD.
-
predict
(demand_fixture_data, params=None, summed=True)[source]¶ Predicts across index using fitted model params
Parameters: - demand_fixture_data (pandas.DataFrame) – Formatted input data as returned by
CaltrackFormatter.create_demand_fixture()
- params (dict, default None) –
Parameters found during model fit. If None, .fit() must be called before this method can be used.
X_design_matrix
: patsy design matrix used in formatting design matrix.formula
: patsy formula used in creating design matrix.coefficients
: ElasticNetCV coefficients.intercept
: ElasticNetCV intercept.
Returns: output – Dataframe of energy values as given by the fitted model across the index given in
demand_fixture_data
.Return type: pandas.DataFrame
- demand_fixture_data (pandas.DataFrame) – Formatted input data as returned by
-
eemeter.processors¶
eemeter.processors.dispatchers¶
-
eemeter.processors.dispatchers.
get_energy_modeling_dispatches
(modeling_period_set, trace_set)[source]¶ Dispatches a set of applicable models and formatters for each pairing of modeling period sets and trace sets given.
Parameters: - modeling_period_set (eemeter.structures.ModelingPeriodSet) –
ModelingPeriod
s to dispatch. - trace_set (eemeter.structures.EnergyTraceSet) –
EnergyTrace
s to dispatch.
- modeling_period_set (eemeter.structures.ModelingPeriodSet) –
eemeter.processors.interventions¶
eemeter.processors.location¶
-
eemeter.processors.location.
get_weather_normal_source
(site, use_cz2010=False)[source]¶ Finds most relevant WeatherSource given project site.
Parameters: - site (eemeter.structures.ZIPCodeSite) – Site to match to weather source data.
- use_cz2010 (boolean, default False) – Indicates whether or not to use CZ2010 mapping.
Returns: weather_normal_source – Closest data-validated TMY3 weather normal source in the same climate zone as project ZIP code, if available. If use_cz2010 is True, returns the corresponding CZ2010WeatherSource. If no station can be found, returns None.
Return type: eemeter.weather.TMY3WeatherSource or eemeter.weather.CZ2010WeatherSource or None
-
eemeter.processors.location.
get_weather_source
(site, use_cz2010=False)[source]¶ Finds most relevant WeatherSource given project site.
Parameters: - site (eemeter.structures.ZIPCodeSite) – Site to match to weather source data.
- use_cz2010 (boolean, default False) – Indicates whether or not to use CZ2010 mapping.
Returns: weather_source – Closest data-validated weather source in the same climate zone as project ZIP code, if available. If use_cz2010 is set, returns the ISDWeatherSource corresponding with the cz2010 station mapping. If no station can be found, returns None.
Return type: eemeter.weather.ISDWeatherSource or None
eemeter.structures¶
-
class
eemeter.structures.
EnergyTrace
(interpretation, data=None, records=None, unit=None, placeholder=False, serializer=None, trace_id=None, interval=None)[source]¶ Container for time series energy data.
Parameters: - interpretation (str) –
The way this energy time series in the
data
attribute should be interpreted. The complete list of supported options is as follows:ELECTRICITY_CONSUMPTION_SUPPLIED
: Represents the amount of utility-supplied electrical energy consumed on-site, as metered at a single usage point, such as a utility-owned electricity meter. Specifically does not include consumption of electricity generated on site, such as by locally installed solar photovoltaic panels.ELECTRICITY_CONSUMPTION_TOTAL
: Represents the amount of electrical energy consumed on-site, including both utility-supplied and on-site generated electrical energy. Equivalent, for a single electricity meter, toELECTRICITY_CONSUMPTION_SUPPLIED
-ELECTRICITY_ON_SITE_GENERATION_CONSUMED
.ELECTRICITY_CONSUMPTION_NET
: Represents the amount of utility-supplied electrical energy consumed on-site minus the amount of unconsumed electrical energy generated on site and fed back into the grid at a single usage point, such as a utility-owned electricity meter. Equivalent, for a single electricity meter, toELECTRICITY_CONSUMPTION_SUPPLIED
-ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED
.ELECTRICITY_ON_SITE_GENERATION_TOTAL
: Represents the amount of locally generated electrical energy consumed on-site plus the amount of locally generated elecrical energy returned to the grid, as metered at a single usage point. Equivalent, for a single electricity meter, toELECTRICITY_ON_SITE_GENERATION_CONSUMED
+ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED
.ELECTRICITY_ON_SITE_GENERATION_CONSUMED
: Represents the amount of locally generated electrical energy consumed on-site, such as energy generated by solar photovoltaic panels.ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED
: Represents the amount of excess locally generated energy, which instead of being consumed on-site, is fed back into the grid or sold back a utility.NATURAL_GAS_CONSUMPTION_SUPPLIED
: Represents the amount of energy supplied by a utility in the form of natural gas and used on site, as metered at a single usage point. Though under the labeling scheme used for electricity interpretetations the labelsNATURAL_GAS_CONSUMPTION_TOTAL
andNATURAL_GAS_CONSUMPTION_NET
would be equivalent for natural gas,NATURAL_GAS_CONSUMPTION_SUPPLIED
is prefered for its greater specificity.
- data (pandas.DataFrame, default None) –
A pandas DataFrame with two columns and a timezone-aware DatetimeIndex. Timestamps in the index are assumed to refer to the start of each period, and the period ends are assumed to coincide with the start of the following period. Thus, the value of the last datetime should always be
NaN
, since is purpose is only to cap the end of the last period, and not to represent a time period over which energy was consumed. The DatetimeIndex does not need to have uniform frequency, such as those specified in pandas using thefreq
attribute.value
: Amount of energy between this index and the next.estimated
: Whether or not the value was estimated. Particularly relevant for monthly billing data.
If
serializer
instance is provided, this should instead be records in the format expected by the serializer. - unit (str) –
The name of the unit in which the energy time series is given. These names are normalized to either
'KWH'
or'THERM'
as follows:'kwh'
becomes'KWH'
with no unit conversion multiplier.'kWh'
becomes'KWH'
with no unit conversion multiplier.'KWH'
becomes'KWH'
with no unit conversion multiplier.'therm'
becomes'THERM'
with no unit conversion multiplier.'therms'
becomes'THERM'
with no unit conversion multiplier.'thm'
becomes'THERM'
with no unit conversion multiplier.'THERM'
becomes'THERM'
with no unit conversion multiplier.'THERMS'
becomes'THERM'
with no unit conversion multiplier.'THM'
becomes'THERM'
with no unit conversion multiplier.'wh'
becomes'KWH'
with a unit conversion multiplier of0.001
.'Wh'
becomes'KWH'
with a unit conversion multiplier of0.001
.'WH'
becomes'KWH'
with a unit conversion multiplier of0.001
.
- placeholder (bool) – Indicates that this instance is a placeholder - that while for some reason the data associated with it is unavailable, its existence is still important in considering a whole site.
- serializer (consumption.BaseSerializer) – Serializer instance to be used to deserialize records into a pandas
dataframe. Must supply the
to_dataframe(records)
method.
- interpretation (str) –
-
class
eemeter.structures.
EnergyTraceSet
(traces, labels=None)[source]¶ A container for energy traces which ensures that each is labeled.
Parameters: - traces (list or dict of eemeter.structures.EnergyTrace objects) – EnergyTrace objects to be included in this list.
- labels (list of str) – Unique labels for traces, used only if traces is not a dictionary.
-
class
eemeter.structures.
Intervention
(start_date, end_date=None)[source]¶ Represents an intervention with a start date, and maybe an end date. Multiple interventions can be composed within a project.
Parameters: - start_date (datetime.datetime) – Must be timezone aware
- end_date (datetime.datetime or None, default None) – Must be timezone aware. If None, intervention is assumed to be ongoing.
-
class
eemeter.structures.
ModelingPeriod
(interpretation, start_date=None, end_date=None)[source]¶ Represents a period of time over which to select data from a Trace for contiguous modeling. Carries an “interpretation”, for which there are two options, “BASELINE” and “REPORTING”. The period is defined by a single optional start date and a single optional end date. If the start date is not given, the start date is considered to be negative infinity; if the end date is not given, the end date is considered to be positive infinity.
A ModelingPeriod is a time period, defined by start and end dates, over which the process behind a trace can be expected, for modeling purposes, to have roughly the same energy response to end use demand. Note that this criterion might not be particularly well specified without reference to a particular intervention and set of modeling conditions.
Parameters: - interpretation (str, {"BASELINE", "REPORTING"}) –
The way this ModelingPeriod should be interpreted.
- ”BASELINE” means that this modeling period represents the time before an intervention or set of interventions.
- ”REPORTING” means that this modeling period represents the time after an intervention or set of interventions.
- start_date (datetime.datetime or None) – The date marking the earliest date of the ModelingPeriod. None indicates a start_date of negative infinity. If interpretation is “REPORTING”, start_date cannot be None.
- end_date (datetime.datetime or None) – The date marking the latest date of the ModelingPeriod. None indicates an end_date of positive infinity. If interpretation is “BASELINE”, end_date cannot be None.
- interpretation (str, {"BASELINE", "REPORTING"}) –
-
class
eemeter.structures.
ModelingPeriodSet
(modeling_periods, groupings)[source]¶ Represents a set of labeled modeling periods of interest, grouped into meaningful comparison sets. Labels can be arbitrary.
Basic usage:
>>> modeling_periods = { ... "modeling_period_1": ModelingPeriod( ... "BASELINE", ... end_date=datetime(2000, 1, 1, tzinfo=pytz.UTC), ... ), ... "modeling_period_2": ModelingPeriod( ... "REPORTING", ... start_date=datetime(2000, 2, 1, tzinfo=pytz.UTC), ... ), ... "modeling_period_3": ModelingPeriod( ... "REPORTING", ... start_date=datetime(2000, 2, 1, tzinfo=pytz.UTC), ... ), ... } ... >>> grouping = [ ... ("modeling_period_1", "modeling_period_2"), ... ("modeling_period_1", "modeling_period_3"), ... ] ... >>> mps = ModelingPeriodSet(modeling_periods, grouping)
-
class
eemeter.structures.
Project
(energy_trace_set, interventions, site, project_id=None)[source]¶ Container for storing project data.
Parameters: - trace_set (eemeter.structures.TraceSet) – Complete set of energy traces for this project. For a project site that has, for example, two electricity meters, each with two traces (supplied electricity kWh, and solar-generated kWh) and one natural gas meter with one trace (consumed natural gas therms), the trace_set should contain 5 traces, regardless of the availablity of that data. Traces which are unavailable should be represented as ‘placeholder’ traces.
- interventions (list of eemeter.structures.Intervention) – Complete set of interventions, planned, ongoing, or completed, that have taken or will take place at this site as part of this project.
- site (eemeter.structures.Site) – The site of this project.
eemeter.weather¶
GSODWeatherSource¶
-
class
eemeter.weather.
GSODWeatherSource
(station, cache_url=None)[source]¶ The
GSODWeatherSource
draws weather data from the NOAA Global Summary of the Day FTP site. It stores fetched data locally by default in a SQLite database at~/eemeter/cache/weather_cache.db
, unless you use set the EEMETER_WEATHER_CACHE_URL environment variable to another, SQLAlchemy compatible database URL:Basic usage is as follows:
>>> from eemeter.weather import GSODWeatherSource >>> ws = GSODWeatherSource("722880") # or another 6-digit USAF station
This object can be used to fetch weather data as follows, using an daily frequency time-zone aware pandas DatetimeIndex covering any stretch of time.
>>> import pandas as pd >>> import pytz >>> index = pd.date_range('2015-01-01', periods=365, ... freq='D', tz=pytz.UTC) >>> ws.indexed_temperatures(index, "degF") 2015-01-01 00:00:00+00:00 43.6 2015-01-02 00:00:00+00:00 45.0 2015-01-03 00:00:00+00:00 47.3 ... 2015-12-29 00:00:00+00:00 48.0 2015-12-30 00:00:00+00:00 46.4 2015-12-31 00:00:00+00:00 47.6 Freq: D, dtype: float64
-
add_year
(year, force_fetch=False)¶ Adds temperature data to internal pandas timeseries
Note
This method is called automatically internally to keep data updated in response to calls to .indexed_temperatures()
Parameters: - year ({int, string}) – The year for which data should be fetched, e.g. “2010”.
- force_fetch (bool, default=False) – If
True
, forces the fetch; ifFalse
, checks to see if locally available before actually fetching.
-
add_year_range
(start_year, end_year, force_fetch=False)¶ Adds temperature data to internal pandas timeseries across a range of years.
Note
This method is called automatically internally to keep data updated in response to calls to .indexed_temperatures()
Parameters: - start_year ({int, string}) – The earliest year for which data should be fetched, e.g. “2010”.
- end_year ({int, string}) – The latest year for which data should be fetched, e.g. “2013”.
- force_fetch (bool, default=False) – If True, forces the fetch; if false, checks to see if year has been added before actually fetching.
-
indexed_temperatures
(index, unit, allow_mixed_frequency=False)¶ Return average temperatures over the given index.
Parameters: - index (pandas.DatetimeIndex) – Index over which to supply average temperatures.
The
index
should be given as either an hourly (‘H’) or daily (‘D’) frequency. - unit (str, {"degF", "degC"}) – Target temperature unit for returned temperature series.
Returns: temperatures – Average temperatures over series indexed by
index
.Return type: pandas.Series with DatetimeIndex
- index (pandas.DatetimeIndex) – Index over which to supply average temperatures.
The
-
ISDWeatherSource¶
-
class
eemeter.weather.
ISDWeatherSource
(station, cache_url=None)[source]¶ The
ISDWeatherSource
draws weather data from the NOAA Integrated Surface Database (ISD) FTP site. It stores fetched hourly data locally by default in a SQLite database at~/eemeter/cache/weather_cache.db
, unless you use set the following environment variable to something different:$ export EEMETER_WEATHER_CACHE_DIRECTORY=/path/to/custom/directory
Basic usage is as follows:
>>> from eemeter.weather import ISDWeatherSource >>> ws = ISDWeatherSource("722880") # or another 6-digit USAF station
This object can be used to fetch weather data as follows, using an hourly or daily frequency time-zone aware pandas DatetimeIndex covering any stretch of time.
>>> import pandas as pd >>> import pytz >>> daily_index = pd.date_range('2015-01-01', periods=365, ... freq='D', tz=pytz.UTC) >>> ws.indexed_temperatures(daily_index, "degF") 2015-01-01 00:00:00+00:00 43.550000 2015-01-02 00:00:00+00:00 45.042500 2015-01-03 00:00:00+00:00 47.307500 ... 2015-12-29 00:00:00+00:00 47.982500 2015-12-30 00:00:00+00:00 46.415000 2015-12-31 00:00:00+00:00 47.645000 Freq: D, dtype: float64 >>> hourly_index = pd.date_range('2015-01-01', periods=365*24, ... freq='H', tz=pytz.UTC) >>> ws.indexed_temperatures(hourly_index, "degF") 2015-01-01 00:00:00+00:00 51.98 2015-01-01 01:00:00+00:00 50.00 2015-01-01 02:00:00+00:00 48.02 ... 2015-12-31 21:00:00+00:00 62.06 2015-12-31 22:00:00+00:00 62.06 2015-12-31 23:00:00+00:00 62.06 Freq: H, dtype: float64
-
add_year
(year, force_fetch=False)¶ Adds temperature data to internal pandas timeseries
Note
This method is called automatically internally to keep data updated in response to calls to .indexed_temperatures()
Parameters: - year ({int, string}) – The year for which data should be fetched, e.g. “2010”.
- force_fetch (bool, default=False) – If
True
, forces the fetch; ifFalse
, checks to see if locally available before actually fetching.
-
add_year_range
(start_year, end_year, force_fetch=False)¶ Adds temperature data to internal pandas timeseries across a range of years.
Note
This method is called automatically internally to keep data updated in response to calls to .indexed_temperatures()
Parameters: - start_year ({int, string}) – The earliest year for which data should be fetched, e.g. “2010”.
- end_year ({int, string}) – The latest year for which data should be fetched, e.g. “2013”.
- force_fetch (bool, default=False) – If True, forces the fetch; if false, checks to see if year has been added before actually fetching.
-
indexed_temperatures
(index, unit, allow_mixed_frequency=False)¶ Return average temperatures over the given index.
Parameters: - index (pandas.DatetimeIndex) – Index over which to supply average temperatures.
The
index
should be given as either an hourly (‘H’) or daily (‘D’) frequency. - unit (str, {"degF", "degC"}) – Target temperature unit for returned temperature series.
Returns: temperatures – Average temperatures over series indexed by
index
.Return type: pandas.Series with DatetimeIndex
- index (pandas.DatetimeIndex) – Index over which to supply average temperatures.
The
-
TMY3WeatherSource¶
-
class
eemeter.weather.
TMY3WeatherSource
(station, cache_url=None, preload=True)[source]¶ The
TMY3WeatherSource
draws weather data from the NREL’s Typical Meteorological Year 3 database. It stores fetched data locally by default in a SQLite database at~/.eemeter/cache/weather_cache.db
, unless you use set the EEMETER_WEATHER_CACHE_URL environment variable to another, SQLAlchemy compatible database URL:Basic usage is as follows:
>>> from eemeter.weather import TMY3WeatherSource >>> ws = TMY3WeatherSource("724830") # or another 6-digit USAF station
This object can be used to fetch weather data as follows, using an daily frequency time-zone aware pandas DatetimeIndex covering any stretch of time.
>>> import pandas as pd >>> import pytz >>> daily_index = pd.date_range('2015-01-01', periods=365, ... freq='D', tz=pytz.UTC) >>> ws.indexed_temperatures(daily_index, "degF") 2015-01-01 00:00:00+00:00 38.6450 2015-01-02 00:00:00+00:00 40.4900 2015-01-03 00:00:00+00:00 43.9175 ... 2015-12-29 00:00:00+00:00 43.7750 2015-12-30 00:00:00+00:00 43.6250 2015-12-31 00:00:00+00:00 46.9250 Freq: D, dtype: float64 >>> hourly_index = pd.date_range('2015-01-01', periods=365*24, ... freq='H', tz=pytz.UTC) >>> ws.indexed_temperatures(hourly_index, "degF") 2015-01-01 00:00:00+00:00 51.80 2015-01-01 01:00:00+00:00 50.00 2015-01-01 02:00:00+00:00 50.00 ... 2015-12-31 21:00:00+00:00 53.60 2015-12-31 22:00:00+00:00 55.40 2015-12-31 23:00:00+00:00 55.40 Freq: H, dtype: float64
-
indexed_temperatures
(index, unit)¶ Return average temperatures over the given index.
Parameters: - index (pandas.DatetimeIndex) – Index over which to supply average temperatures.
The
index
should be given as either an hourly (‘H’) or daily (‘D’) frequency. - unit (str, {"degF", "degC"}) – Target temperature unit for returned temperature series.
Returns: temperatures – Average temperatures over series indexed by
index
.Return type: pandas.Series with DatetimeIndex
- index (pandas.DatetimeIndex) – Index over which to supply average temperatures.
The
-
Location¶
-
eemeter.weather.location.
climate_zone_is_supported
(climate_zone)[source]¶ True if given Climate Zone is supported.
Parameters: climate_zone (str) – String representing a climate_zone. Returns: supported – True if supported, otherwise False. Return type: bool
-
eemeter.weather.location.
climate_zone_to_tmy3_stations
(climate_zone)[source]¶ Return TMY3 weather stations falling within in the given climate zone.
Parameters: climate_zone (str) – String representing a climate zone. Returns: stations – Strings representing TMY3 station ids. Return type: list of str
-
eemeter.weather.location.
climate_zone_to_usaf_stations
(climate_zone)[source]¶ Return USAF weather stations falling within in the given climate zone.
Parameters: climate_zone (str) – String representing a climate zone. Returns: stations – Strings representing USAF station ids. Return type: list of str
-
eemeter.weather.location.
climate_zone_to_zipcodes
(climate_zone)[source]¶ Return ZIP codes with centroids in the given climate zone.
Parameters: climate_zone (str) – String representing a climate zone. Returns: zipcodes – Strings representing USPS ZIP codes. Return type: list of str
-
eemeter.weather.location.
cz2010_station_is_supported
(station)[source]¶ True if given CZ2010 weather station is supported. USAF IDs.
Parameters: station (str) – 6-digit string representing a weather station. Returns: supported – True if supported, otherwise False. Return type: bool
-
eemeter.weather.location.
haversine
(lat1, lng1, lat2, lng2)[source]¶ Calculate the great circle distance between two points on the earth (specified in decimal degrees)
Parameters: - lat1 (float) – Latitude coordinate of first point.
- lng1 (float) – Longitude coordinate of first point.
- lat2 (float) – Latitude coordinate of second point.
- lng2 (float) – Longitude coordinate of second point.
Returns: distance – Kilometers between the two lat/lng coordinates.
Return type: float
-
eemeter.weather.location.
lat_lng_to_climate_zone
(lat, lng)[source]¶ Return the closest ZIP code using latitude and longitude coordinates.
Parameters: - lat (float) – Latitude coordinate.
- lng (float) – Longitude coordinate.
Returns: climate_zone – String representing a climate zone.
Return type: str, None
-
eemeter.weather.location.
lat_lng_to_tmy3_station
(lat, lng)[source]¶ Return the closest TMY3 station ID using latitude and longitude coordinates.
Parameters: - lat (float) – Latitude coordinate.
- lng (float) – Longitude coordinate.
Returns: station – String representing a TMY3 weather station ID or None, if none was found.
Return type: str, None
-
eemeter.weather.location.
lat_lng_to_usaf_station
(lat, lng)[source]¶ Return the closest USAF station ID using latitude and longitude coordinates.
Parameters: - lat (float) – Latitude coordinate.
- lng (float) – Longitude coordinate.
Returns: station – String representing a USAF weather station ID or None, if none was found.
Return type: str, None
-
eemeter.weather.location.
lat_lng_to_zipcode
(lat, lng)[source]¶ Return the closest ZIP code using latitude and longitude coordinates.
Parameters: - lat (float) – Latitude coordinate.
- lng (float) – Longitude coordinate.
Returns: zipcode – String representing a USPS ZIP code, or None, if none was found.
Return type: str, None
-
eemeter.weather.location.
tmy3_station_is_supported
(station)[source]¶ True if given TMY3 weather station is supported. USAF IDs.
Parameters: station (str) – 6-digit string representing a weather station. Returns: supported – True if supported, otherwise False. Return type: bool
-
eemeter.weather.location.
tmy3_station_to_climate_zone
(station)[source]¶ Return the climate zone of the station.
Parameters: station (str) – String representing a USAF Weather station ID Returns: climate_zone – String representing a climate zone. Return type: str
-
eemeter.weather.location.
tmy3_station_to_lat_lng
(station)[source]¶ Return the latitude and longitude coordinates of the given station.
Parameters: station (str) – String representing a TMY3 USAF Weather station ID Returns: lat_lng – Latitude and longitude coordinates. Return type: tuple of float
-
eemeter.weather.location.
tmy3_station_to_zipcodes
(station)[source]¶ Return the zipcodes that map to this station.
Parameters: station (str) – String representing a USAF Weather station ID Returns: zipcode – String representing a USPS ZIP code. Return type: list of str
-
eemeter.weather.location.
usaf_station_is_supported
(station)[source]¶ True if given USAF weather station is supported. USAF IDs.
Parameters: station (str) – 6-digit string representing a weather station. Returns: supported – True if supported, otherwise False. Return type: bool
-
eemeter.weather.location.
usaf_station_to_climate_zone
(station)[source]¶ Return the climate zone of the station.
Parameters: station (str) – String representing a USAF Weather station ID Returns: climate_zone – String representing a climate zone Return type: str
-
eemeter.weather.location.
usaf_station_to_lat_lng
(station)[source]¶ Return the latitude and longitude coordinates of the given USAF station.
Parameters: station (str) – String representing a USAF Weather station ID Returns: lat_lng – Latitude and longitude coordinates. Return type: tuple of float
-
eemeter.weather.location.
usaf_station_to_zipcodes
(station)[source]¶ Return the zipcodes that map to this USAF station.
Parameters: station (str) – String representing a USAF Weather station ID Returns: zipcodes – Strings representing a USPS ZIP code mapped to from this station. Return type: list of str
-
eemeter.weather.location.
zipcode_is_supported
(zipcode)[source]¶ True if given ZIP Code is supported. ZCTA only.
Parameters: zipcode (str) – 5-digit string representing a zipcode. Returns: supported – True if supported, otherwise False. Return type: bool
-
eemeter.weather.location.
zipcode_to_climate_zone
(zipcode)[source]¶ Return the climate zone of the ZIP code (by latitude and longitude centroid of ZIP code).
Parameters: zipcode (str) – String representing a USPS ZIP code. Returns: climate_zone – String representing a climate zone Return type: str
-
eemeter.weather.location.
zipcode_to_cz2010_station
(zipcode)[source]¶ Return the nearest CZ2010 station (by latitude and longitude centroid) of the ZIP code.
Parameters: - zipcode (str) – String representing a USPS ZIP code.
- use_cz2010 (boolean, default False) – Use the CZ2010 zipcode to weather station mapping.
Returns: station – String representing a CZ2010 weather station ID
Return type: str
-
eemeter.weather.location.
zipcode_to_lat_lng
(zipcode)[source]¶ Return the latitude and longitude centroid of a particular ZIP code.
Parameters: zipcode (str) – String representing a USPS ZIP code. Returns: lat_lng – Latitude and longitude coordinates. Return type: tuple of float
-
eemeter.weather.location.
zipcode_to_tmy3_station
(zipcode)[source]¶ Return the nearest TMY3 station (by latitude and longitude centroid) of the ZIP code.
Parameters: zipcode (str) – String representing a USPS ZIP code. Returns: station – String representing a TMY3 Weather station (USAF ID). Return type: str
Development¶
Testing¶
This library uses the py.test framework. To develop locally, clone the repo, and in a virtual environment execute the following commands:
$ git clone https://github.com/openeemeter/eemeter
$ cd eemeter
$ mkvirtualenv eemeter
$ pip install -r dev_requirements.txt
$ pip install -e .
$ tox
Building Documentation¶
Documentation is built using the sphinx
package.
To build documentation, make sure that dev requirements are installed:
$ pip install -r dev_requirements.txt
You will also need to [install pandoc](http://pandoc.org/installing.html) to build docs locally.
And run the following from the root project directory.
$ make -C docs html
To clean the build directory, run the following:
$ make -C docs clean
datastore¶
The datastore is an application for housing energy and project data which provides a REST API for loading data, computing energy savings, and inspecting results. Like the eemeter library, the datastore is open source and available on github under an MIT license.
The datastore uses the django web framework with a PostgreSQL database.
Development Setup¶
Clone the repo and change directories¶
git clone git@github.com:openeemeter/datastore.git
cd datastore
Install required python packages¶
We recommend using virtualenv (or virtualenvwrapper) to manage python packages
mkvirtualenv datastore
pip install -r requirements.txt
pip install -r dev-requirements.txt
Define the necessary environment variables¶
# django
export DJANGO_SETTINGS_MODULE=oeem_energy_datastore.settings
export SECRET_KEY=<django-secret-key> # random string
# postgres
export DATABASE_URL=postgres://user:password@host:5432/dbname
# for API docs - should reflect the IP or DNS name where datastore will be deployed
export SERVER_NAME=0.0.0.0:8000
export PROTOCOL=http # or https
# For development only
export DEBUG=true
# For celery background tasks
export CELERY_ALWAYS_EAGER=true
or
export BROKER_TRANSPORT=redis
export BROKER_URL=redis://user:password@host:9549
If developing on the datastore, you might consider adding these to your virtualenv postactivate script:
vim /path/to/virtualenvs/datastore/bin/postactivate
# Refresh environment
workon datastore
Run database migrations¶
python manage.py migrate
Seed the database¶
python manage.py dev_seed
Start a development server¶
python manage.py runserver
Topics¶
Basic Usage: datastore application¶
The datastore is a tool for using the eemeter which automates and helps to scale some of the most frequent tasks accomplished by the eemeter. These tasks include data loading and storage, meter running, result storage and warehousing. It puts a REST API in front of the eemeter and uses a postgres backend.
This tutorial is also available as a jupyter notebook
.
Note:
This tutorial assumes you have a working datastore instance. If you do not, please follow the datastore development setup instructions or contact Open EE to setting up a dedicated production deployment.
Note:
For small and large datasets, the ETL toolkit exists to ease and speed up the process of loading your data.
This tutorial does not cover ETL toolkit usage. For more information on the ETL toolkit, see its API documentation.
Setup¶
In [1]:
# library imports
import pandas as pd
import requests
import pytz
If you followed the datastore development setup instructions, you will already have run the command to create a superuser and access credentials.
python manage.py dev_seed
If you haven’t already done so, do so now. The dev_seed
command
creates a demo admin user and a sample project.
- username:
demo
, - password:
demo-password
, - API access token:
tokstr
.
Ensure that your development server is running locally on port 8000 before continuing.
python manage.py runserver
Each request will include an Authorization header
Authorization: Bearer tokstr
In [2]:
base_url = "http://0.0.0.0:8000"
token = "tokstr"
headers = {"Authorization": "Bearer {}".format(token)}
Using the API to get loaded data¶
We can use the API to inspect the data that is loaded into the datastore. (The API can also be used for loading data, but that is not covered here. See the ETL tutorial for more information on loading data.)
Note:
We will use the requests
python package for making requests, but you
could just as easily use a tool like cURL or Postman.
If you have the eemeter
package installed, you will also have the
requests
package installed, but if not, you can install it with:
$ pip install requests
A request using the requests library looks like this:
import requests
url = "https://example.com"
data = {
"first_name": "John",
"last_name": "Doe"
}
requests.post(url + "/api/users/", json=data)
which is equivalent to:
POST /api/users/ HTTP/1.1
Host: example.com
{
"first_name": "John",
"last_name": "Doe"
}
Since the dev_seed command creates a sample project, this will return a response showing that project. Projects all have a unique “project_id”, which can be set to whatever is most appropriate (note: it is not used as primary key; that’s the ‘id’ field).
In [3]:
url = base_url + "/api/v1/projects/"
projects = requests.get(url, headers=headers).json()
In [4]:
projects
Out[4]:
[{'baseline_period_end': '2012-01-01T00:00:00Z',
'baseline_period_start': None,
'id': 1,
'project_id': 'DEV_SEED_PROJECT',
'project_owner_id': 1,
'reporting_period_end': None,
'reporting_period_start': '2012-02-01T00:00:00Z',
'zipcode': '91104'}]
Energy trace data will be associated with this project by foreign key through a many-to-many table. This means that projects can have 0 to n associated traces, and that traces can have 0 to n associated projects.
Like projects and the project_id field, traces are identified by a unique ‘trace_id’ field, which can also be set to whatever is most appropriate.
There are API endpoints used to fetch trace data:
/api/v1/traces/
: This stores trace ids, unit, and interpretation./api/v1/trace_records/
: This stores time-series records associated with each trace.
These records are stored by record start timestamp, with the implicit
assumption that the start
timestamp of the next temporal record is
the end of the current record. The value of the last record is ignored,
and serves as the final end timestamp (and is usually set to null).
In [5]:
url = base_url + "/api/v1/traces/?projects={}".format(projects[0]['id'])
traces = requests.get(url, headers=headers).json()
In [6]:
traces
Out[6]:
[{'id': 1,
'interpretation': 'NATURAL_GAS_CONSUMPTION_SUPPLIED',
'trace_id': 'DEV_SEED_TRACE_NATURAL_GAS_MONTHLY',
'unit': 'THERM'},
{'id': 2,
'interpretation': 'NATURAL_GAS_CONSUMPTION_SUPPLIED',
'trace_id': 'DEV_SEED_TRACE_NATURAL_GAS_DAILY',
'unit': 'THERM'},
{'id': 3,
'interpretation': 'ELECTRICITY_CONSUMPTION_SUPPLIED',
'trace_id': 'DEV_SEED_TRACE_ELECTRICITY_15MIN',
'unit': 'KWH'},
{'id': 4,
'interpretation': 'ELECTRICITY_CONSUMPTION_SUPPLIED',
'trace_id': 'DEV_SEED_TRACE_ELECTRICITY_HOURLY',
'unit': 'KWH'},
{'id': 5,
'interpretation': 'ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED',
'trace_id': 'DEV_SEED_TRACE_SOLAR_HOURLY',
'unit': 'KWH'},
{'id': 6,
'interpretation': 'ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED',
'trace_id': 'DEV_SEED_TRACE_SOLAR_30MIN',
'unit': 'KWH'}]
We can also query for trace records by trace primary key.
In [7]:
url = base_url + "/api/v1/trace_records/?trace={}".format(traces[0]['id'])
trace_records = requests.get(url, headers=headers).json()
In [8]:
trace_records[:3] # first 3 records
Out[8]:
[{'estimated': False,
'id': 1,
'start': '2010-01-01T00:00:00Z',
'trace_id': 1,
'value': None},
{'estimated': False,
'id': 2,
'start': '2010-02-01T00:00:00Z',
'trace_id': 1,
'value': 1.0},
{'estimated': False,
'id': 3,
'start': '2010-03-01T00:00:00Z',
'trace_id': 1,
'value': 1.0}]
Running meters¶
Running a meter means pulling trace data, matching it with relevant project data, and evaluating its energy effiency performance. This is the central task performed by the datastore, so if the specifics are unfamiliar, there is a bit more background information worthy of review in the Methods Overview section of the guides.
To run a meter, make a request to create a “meter run”. This request will start a job that runs a meter and saves its results. The result of a meter run is called a “meter result”.
In [9]:
from collections import OrderedDict
import json
Scheduling a single meter run¶
The primary component of this request is a trace primary key.
The project data associated with the trace will be automatically pulled in to be associated with the trace.
In [10]:
created_meter_run = requests.post(
base_url + "/api/v1/meter_runs/",
json={
"trace": traces[0]['id'] # single trace primary key
},
headers=headers
).json(object_pairs_hook=OrderedDict) # retains order of keys
In [11]:
print(json.dumps(created_meter_run, indent=2))
{
"id": 1,
"trace": 1,
"project": 1,
"meter_result": 1,
"meter_input": null,
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:16:36.078334Z",
"updated": "2016-11-18T02:16:36.078375Z"
}
This is a summary of the task to run the meter on the indicated project.
The response shows us the complete specification of the meter run behavior, which is as follows:
project
: the project primary key (determined implicitly from the trace).trace
: the trace primary key (given in API request).status
: the task status code (in this case"PENDING"
), other options are:
"PENDING"
: which means the tasks is scheduled but not yet running or completed."RUNNING"
: task is currently running."SUCCESS"
: successful completion."FAILED"
: failed due to some sort of error.
meter_result
: the primary key of the meter result.meter_input
: has not yet been created (this is the complete serialized input to the meter, as required by the eemeter.)model_class
andmodel_kwargs
: The model class and arguments used in meter fitting.
- If these are left blank, default values will be used.
formatter_class
andformatter_kwargs
: The formatter class and arguments used in meter fitting.
- If these are left blank, default values will be used.
If you wish, you can also specify many of these properties explicitly and we will do so in a following section.
Let’s make another call to inspect the state of this meter run
In [12]:
meter_run = requests.get(
base_url + "/api/v1/meter_runs/{}/".format(created_meter_run['id']),
headers=headers
).json(object_pairs_hook=OrderedDict)
In [13]:
print(json.dumps(meter_run, indent=2))
{
"id": 1,
"trace": 1,
"project": 1,
"meter_result": 1,
"meter_input": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_inputs/5fa24b58-444b-4c72-a8c9-bb0327b23118.json",
"status": "SUCCESS",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:16:36.078334Z",
"updated": "2016-11-18T02:17:44.356211Z"
}
The associated meter result is also available now and carries a set of outputs that include the meter run value and additionally:
meter_output
: serialized output of the meter run.eemeter_version
anddatastore_version
: software version of eemeter library and datastore application
In [14]:
meter_result = requests.get(
base_url + "/api/v1/meter_results/{}/".format(created_meter_run['meter_result']),
headers=headers
).json(object_pairs_hook=OrderedDict)
In [15]:
print(json.dumps(meter_result, indent=2))
{
"id": 1,
"trace": 1,
"project": 1,
"meter_run": 1,
"meter_output": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_outputs/e1896b44-0b89-49ac-93e1-8eb6e44987bd.json",
"status": "SUCCESS",
"eemeter_version": "0.4.12",
"datastore_version": "0.2.3",
"model_class": "BillingElasticNetCVModel",
"model_kwargs": {
"heating_base_temp": 65,
"cooling_base_temp": 65
},
"formatter_class": "ModelDataBillingFormatter",
"formatter_kwargs": {},
"added": "2016-11-18T02:17:44.203325Z",
"updated": "2016-11-18T02:17:44.223200Z"
}
Customizing meter runs¶
Meter runs can also be customized by specifying various attributes explicitly, such as custom arguments for the model class.
In [16]:
custom_meter_run = requests.post(
base_url + "/api/v1/meter_runs/",
json={
"trace": 2,
"project": 1,
"model_kwargs": {
"heating_base_temp": 64, # different temperature
"cooling_base_temp": 64,
},
},
headers=headers
).json(object_pairs_hook=OrderedDict)
In [17]:
print(json.dumps(custom_meter_run, indent=2))
{
"id": 2,
"trace": 2,
"project": 1,
"meter_result": 2,
"meter_input": null,
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": {
"heating_base_temp": 64,
"cooling_base_temp": 64
},
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:17:44.681341Z",
"updated": "2016-11-18T02:17:44.681374Z"
}
Or, if you leave out the project and trace attributes, you can specify the exact serialized input. This means that if serialized meter inputs are available, you need not explicitly load traces and projects through ETL.
Please download a preformatted input file
for this step.
In [18]:
with open('meter_input_example.json', 'r') as f:
meter_input = f.read() # loaded as a serialized string
meter_input_meter_run = requests.post(
base_url + "/api/v1/meter_runs/",
json={
"meter_input": meter_input,
},
headers=headers
).json(object_pairs_hook=OrderedDict)
In [19]:
print(json.dumps(meter_input_meter_run, indent=2))
{
"id": 3,
"trace": null,
"project": null,
"meter_result": 3,
"meter_input": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_inputs/bf0629db-0c81-4ded-8dcc-adbd0ddbf3f3.json",
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:19:23.155268Z",
"updated": "2016-11-18T02:19:23.155857Z"
}
In [20]:
meter_run = requests.get(
base_url + "/api/v1/meter_runs/{}/".format(meter_input_meter_run['id']),
headers=headers
).json(object_pairs_hook=OrderedDict)
print(json.dumps(meter_run, indent=2))
{
"id": 3,
"trace": null,
"project": null,
"meter_result": 3,
"meter_input": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_inputs/bf0629db-0c81-4ded-8dcc-adbd0ddbf3f3.json",
"status": "SUCCESS",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:19:23.155268Z",
"updated": "2016-11-18T02:22:13.679220Z"
}
In [21]:
meter_result = requests.get(
base_url + "/api/v1/meter_results/{}/".format(meter_input_meter_run['meter_result']),
headers=headers
).json(object_pairs_hook=OrderedDict)
print(json.dumps(meter_result, indent=2))
{
"id": 3,
"trace": null,
"project": null,
"meter_run": 3,
"meter_output": "https://storage.googleapis.com/my-storage-bucket/datastore/meter_run_outputs/7ab2bd31-a723-4c75-afa6-424d560ab284.json",
"status": "SUCCESS",
"eemeter_version": "0.4.12",
"datastore_version": "0.2.3",
"model_class": "SeasonalElasticNetCVModel",
"model_kwargs": {
"heating_base_temp": 65,
"cooling_base_temp": 65
},
"formatter_class": "ModelDataFormatter",
"formatter_kwargs": {
"freq_str": "D"
},
"added": "2016-11-18T02:22:13.566866Z",
"updated": "2016-11-18T02:22:13.583138Z"
}
Meters can also be triggered in bulk; the next section covers this.
Bulk-triggering meter runs¶
Often it is more convenient to trigger many meter runs at once than to do it trace-by-trace. This can be done either through the API or through a datastore management command.
Through the API¶
The following sends a list of “targets” to the datastore for triggering. Here, we’re triggering a set of meter runs for one project, which will trigger meter runs for all associated traces.
Warning:
The following may take a few minutes to complete. If you have enabled celery workers, it will execute more quickly and computation will continue in the background. If this is the case for you, you should wait until that computation has completed before continuing.
For more information on background worker setup, see datastore setup instructions.
To follow progress, watch the datastore logs or use the
meter_progress
command. In a development environment, these are
printed in the python manage.py runserver
output.
In [22]:
bulk_created_meter_runs = requests.post(
base_url + "/api/v1/meter_runs/bulk/", # note: different url!
json={
"targets": [ # a list of targets can be provided
{
"project": projects[0]['id']
},
]
},
headers=headers
).json(object_pairs_hook=OrderedDict)
In [23]:
print(json.dumps(bulk_created_meter_runs, indent=2))
[
[
{
"id": 4,
"trace": 3,
"project": 1,
"meter_result": 4,
"meter_input": null,
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:22:14.088620Z",
"updated": "2016-11-18T02:22:14.088658Z"
},
{
"id": 5,
"trace": 4,
"project": 1,
"meter_result": 5,
"meter_input": null,
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:26:15.683349Z",
"updated": "2016-11-18T02:26:15.683442Z"
},
{
"id": 6,
"trace": 5,
"project": 1,
"meter_result": 6,
"meter_input": null,
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:28:00.757629Z",
"updated": "2016-11-18T02:28:00.757666Z"
},
{
"id": 7,
"trace": 6,
"project": 1,
"meter_result": 7,
"meter_input": null,
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:29:09.066736Z",
"updated": "2016-11-18T02:29:09.066777Z"
},
{
"id": 8,
"trace": 1,
"project": 1,
"meter_result": 8,
"meter_input": null,
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:32:05.062196Z",
"updated": "2016-11-18T02:32:05.062238Z"
},
{
"id": 9,
"trace": 2,
"project": 1,
"meter_result": 9,
"meter_input": null,
"status": "PENDING",
"failure_message": null,
"traceback": null,
"model_class": null,
"model_kwargs": null,
"formatter_class": null,
"formatter_kwargs": null,
"added": "2016-11-18T02:32:30.471808Z",
"updated": "2016-11-18T02:32:30.471864Z"
}
]
]
Note that results are returned grouped by target (as a list).
If model or formatter class or kwarg arguments are supplied, they will be applied to all meter_runs.
Through a management command¶
The other way to bulk-trigger meter runs is through a management command.
python manage.py run_meters --all-traces
You can monitor the progress of these commands with:
python manage.py meter_progress --all-meters --poll-until-complete
Meter result warehouse tables¶
For easy access to summarized meter result data, it may be helpful to use the meter result “mart”, which is part of the data warehouse that can be created in the postgres database.
Data warehouse tables make it easier to query into results by summarizing the most relevant information.
To create warehouse tables, use the following management command:
$ python manage.py meterresultmart recreate
This is equivalent to running
$ python manage.py meterresultmart destroy
$ python manage.py meterresultmart create
Running the create
command without first destroying will give
duplicate rows.
Using the warehouse_meterresultmart
table¶
The easiest way to access the results of the warehouse is to connect an analytics service which can read from the database directly.
If that is not available to you, you can also query directly with postgres. Assuming you have a database set up called “datastore” (yours may be named differently, depending on how you set it up), you can connect as follows:
$ psql datastore
psql (9.4.1)
Type "help" for help.
datastore=# SELECT
trace_id
, differential_lower_bound as savings_lower_bound
, differential_value as savings
, differential_upper_bound as savings_upper_bound
FROM
warehouse_meterresultmart
WHERE
project_id='DEV_SEED_PROJECT'
AND
derivative_interpretation='gross_predicted'
ORDER BY
project_id
, trace_id
, derivative_interpretation;
trace_id | savings_lower_bound | savings | savings_upper_bound
------------------------------------+---------------------+-------------------+---------------------
DEV_SEED_TRACE_ELECTRICITY_15MIN | 2.21781163934072 | 4.93938974304001 | 7.6609678467393
DEV_SEED_TRACE_ELECTRICITY_HOURLY | 10.47300113325 | 12.9734969290002 | 15.4739927247505
DEV_SEED_TRACE_NATURAL_GAS_DAILY | -12.4594891213768 | -6.03261538803008 | 0.394258345316612
DEV_SEED_TRACE_NATURAL_GAS_MONTHLY | -774.348987437802 | -580.576019960851 | -386.8030524839
DEV_SEED_TRACE_SOLAR_30MIN | 0.848394785466816 | 3.81981394853938 | 6.79123311161194
DEV_SEED_TRACE_SOLAR_HOURLY | | |
(6 rows)
datastore=#
Aggregations and groups¶
Traces can be aggregated by putting them into groups and triggering aggregation runs.
Groups must be named, and are defined by combinations of filters over project_id, trace_id, or arbitary project metadata.
Filters are created with the following attributes as either a “filter” or a “filter_boolean”, which is combination of two filters.
Filter types:
"filter"
:
"target"
, can be:- “project_id”
- “trace_id”
- “project_metadata|NAME_OF_ATTRIBUTE”
"comparison"
, can be:- “>”, “>=”, “<”, “<=”, “==”, “!=”
- “in”, “not in”
"value"
, can be:- int, float, str (for comparisons “>”, “>=”, “<”, “<=”, “==”, “!=”)
- list of values (for comparisons “in”, “not in”)
"filter_boolean"
:
"boolean"
, can be:- “and”, “or”
"filter_a"
, can be:- filter, filter_boolean
"filter_b"
, can be:- filter, filter_boolean
Example filter specification creation:
In [24]:
filter_specification = {
"filter": {
"target": "project_id",
"comparison": "==",
"value": projects[0]["project_id"],
}
}
In [25]:
trace_group = requests.post(
base_url + "/api/v1/trace_groups/", # note: different url!
json={
"name": "project_group",
"filter_specification": filter_specification,
},
headers=headers
).json(object_pairs_hook=OrderedDict)
In [26]:
print(json.dumps(trace_group, indent=2))
{
"id": 3,
"name": "project_group",
"filter_specification": {
"filter": {
"comparison": "==",
"target": "project_id",
"value": "DEV_SEED_PROJECT"
}
}
}
In [27]:
aggregation_run = requests.post(
base_url + "/api/v1/aggregation_runs/",
json={
"group": trace_group['id'],
"trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
"derivative_interpretation": "annualized_weather_normal",
},
headers=headers
).json(object_pairs_hook=OrderedDict)
In [28]:
print(json.dumps(aggregation_run, indent=2))
{
"id": 7,
"group": 3,
"aggregation_result": 1,
"aggregation_input": null,
"status": "PENDING",
"traceback": null,
"failure_message": null,
"trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
"derivative_interpretation": "annualized_weather_normal",
"aggregation_interpretation": "SUM",
"added": "2016-11-18T03:04:00.425945Z",
"updated": "2016-11-18T03:04:00.426601Z"
}
In [29]:
aggregation_run = requests.get(
base_url + "/api/v1/aggregation_runs/{}/".format(aggregation_run["id"]),
headers=headers
).json(object_pairs_hook=OrderedDict)
print(json.dumps(aggregation_run, indent=2))
{
"id": 7,
"group": 3,
"aggregation_result": 1,
"aggregation_input": "https://storage.googleapis.com/my-storage-bucket/datastore/aggregation_inputs/2b986708-1076-4a16-a2dd-31a78f93d817.json",
"status": "SUCCESS",
"traceback": null,
"failure_message": null,
"trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
"derivative_interpretation": "annualized_weather_normal",
"aggregation_interpretation": "SUM",
"added": "2016-11-18T03:04:00.425945Z",
"updated": "2016-11-18T03:04:03.806500Z"
}
In [30]:
aggregation_result = requests.get(
base_url + "/api/v1/aggregation_results/{}/".format(aggregation_run["aggregation_result"]),
headers=headers
).json(object_pairs_hook=OrderedDict)
print(json.dumps(aggregation_result, indent=2))
{
"id": 1,
"aggregation_run": 7,
"trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
"derivative_interpretation": "annualized_weather_normal",
"aggregation_interpretation": "SUM",
"aggregation_output": "https://storage.googleapis.com/my-storage-bucket/datastore/aggregation_outputs/4295fd00-4ecd-494e-9eeb-884ef620ce14.json",
"derivatives": [
7,
9
],
"unit": "KWH",
"baseline_value": 4863.15574521486,
"baseline_lower": 1.5110421742195,
"baseline_upper": 1.5110421742195,
"baseline_n": 730.0,
"reporting_value": 4860.88336194519,
"reporting_lower": 0.545198012058528,
"reporting_upper": 0.545198012058528,
"reporting_n": 730.0,
"differential_direction": "BASELINE_MINUS_REPORTING",
"differential_value": 2.27238326967017,
"differential_lower": 1.60639015330105,
"differential_upper": 1.60639015330105,
"differential_n": 1460.0,
"eemeter_version": "0.4.12",
"datastore_version": "0.2.3",
"added": "2016-11-18T03:04:03.701863Z",
"updated": "2016-11-18T03:04:03.701911Z"
}
Additional filter examples:¶
All traces:
None # leave blank
Traces with project cost less than or equal to 10000:
{
"filter": {
"target": "project_metadata|project_cost",
"comparison": "<=",
"value": 10000,
}
}
Traces with project_id in particular set:
{
"filter": {
"target": "project_id",
"comparison": "in",
"value": [
"PROJECT_101",
"PROJECT_102"
]
}
}
Traces with project_id in particular set or with project cost greater than or equal to 5000:
{
"filter_boolean": {
"boolean": "or",
"filter_a": {
"filter": {
"target": "project_id",
"comparison": "in",
"value": [
"PROJECT_101",
"PROJECT_102"
]
}
},
"filter_b": {
"filter": {
"target": "project_metadata|project_cost",
"comparison": ">=",
"value": 5000,
}
}
}
}
Deeply nested filter:
{
"filter_boolean": {
"boolean": "and",
"filter_a": {
"filter_boolean": {
"boolean": "and",
"filter_a": {
"filter": {
"target": "project_metadata|contractor",
"comparison": "==",
"value": "AAA CONTRACTING",
}
},
"filter_b": {
"filter": {
"target": "project_metadata|project_type",
"comparison": "!=",
"value": "SOLAR"
}
},
}
},
"filter_b": {
"filter": {
"target": "project_metadata|project_cost",
"comparison": ">=",
"value": 5000,
}
}
}
}
Group statistics warehouse tables¶
For easy access to summarized aggregated data, it may be helpful to use the group statistics mart, which is part of the data warehouse that can be created in the postgres database.
This supplements the meter result mart by providing summarized group statistics.
Just as with the meter result mart, the group statistics mart can also be created with a management command:
$ python manage.py groupstatisticsmart recreate
PostgreSQL tables¶
A data dictionary describing available datastore database tables.
Core project and trace data (i.e., data loaded through ETL)¶
Name of Table | Name of Row | Description of Row |
---|---|---|
datastore_project | ||
id | Primary key | |
project_id | Unique project identifier provided by the user | |
baseline_period_start | [null] | |
baseline_period_end | Populated through ETL from project data | |
reporting_period_start | Populated through ETL from project data | |
reporting_period_end | [null] | |
zipcode | Populated through ETL from project data | |
project_owner_id | Optional foreign key to datastore_projectowner table | |
added | Date Added | |
updated | Date updated | |
datastore_trace | Refers to an energy trace (a time series of data from a meter) | |
id | Primary key | |
trace_id | Unique identifier for trace | |
interpretation | Type of energy data. | |
unit | Unit of measure | |
added | Date that the data was added to the database | |
updated | Timestamp for last updated | |
datastore_tracerecord | Single point in trace timeseries | |
id | Primary key | |
trace_id | Foreign key to datastore_trace table | |
value | Value from start of this record to start of the next record | |
estimated | True/False | |
start | Start time of interval; end is given by next record (as ordered by start timestamp). | |
datastore_project_traces | Many-to-many table linking projects and traces | |
project_id | Foriegn key to datastore_project table | |
trace_id | Foriegn key to datastore_trace table | |
datastore_projectmetadata | Project metadata | |
project_id | Foriegn key to datastore_project table | |
key | String identifying metadata type | |
value | Value of metadata | |
datastore_tracegroup | Grouping of traces defined by a filter | |
name | Name of group | |
filter_specification | JSON specification of filter defining group |
Meter run and meter result data¶
Metering tables
Name of Table | Name of Row | Description of Row |
---|---|---|
metering_meterderivative | Table of predictive and descriptive summaries of savings | |
id | Primary key | |
interpretation | Interpretation of derivative (e.g., gross_predicted/annualized_weather_normal) | |
unit | Unit of values, upper and lower bounds. | |
baseline_value | Modeled counterfactual baseline value | |
baseline_lower | Amount to be subtracted from baseline_value to obtain lower bound on 95% confidence interval | |
baseline_upper | Amount to be added to baseline_value to obtain upper bound on 95% confidence interval | |
baseline_n | Number of points in baseline demand fixture | |
reporting_value | Modeled reporting period value | |
reporting_lower | Amount to be subtracted from reporting_value to obtain lower bound on 95% confidence interval | |
reporting_upper | Amount to be added to reporting_value to obtain upper bound on 95% confidence interval | |
reporting_n | Number of points in reporting demand fixture | |
added | Date added | |
updated | Date updated | |
meter_result_id | Primary key of meter result this derivative was extracted from | |
modeling_period_group_id | Primary key of modeling period group describing baseline and reporting period details | |
trace_id | Primary key of trace this derivative applies to | |
metering_meterresult | Table of meter run results | |
id | Primary key | |
meter_output | Filename of JSON serialization of meter output | |
status | SUCCESS/FAILURE | |
eemeter_version | Version of eemeter library used to calculate this result | |
datastore_version | Version of datastore application used to calculate this result | |
model_class | Name of model class | |
model_kwargs | Keyword arguments to model class | |
formatter_class | Name of formatter class | |
formatter_kwargs | Keyword arguments to formatter class | |
added | Date added | |
updated | Date updated | |
meter_run_id | Primary key of meter run | |
project_id | Primary key of project data | |
trace_id | Primary key of trace | |
metering_meterrun | Table of meter runs | |
id | Primary key | |
meter_input | Filename of JSON serialiation | |
status | PENDING/RUNNING/SUCCESS/FAILURE | |
failure_message | Failure message, if any | |
traceback | Traceback text, if error occured | |
model_class | Name of model class supplied, if any | |
model_kwargs | Model class keyword arguments supplied, if any | |
formatter_class | Name of formatter class supplied, if any | |
formatter_kwargs | Formatter class keyword arguments supplied, if any | |
added | Date added | |
updated | Date updated | |
project_id | Primary key of project data | |
trace_id | Primary key of trace | |
metering_modelingperiod | Table describing a modeling period | |
id | Primary key | |
label | Label to distinguish from other baseine/reporting/periods in same meter result | |
interpretation | BASELINE/REPORTING | |
start | Date of modeling period start, if any (can be blank for baseline) | |
end | Date of modeling period end, if any (can be blank for reporting) | |
meter_result_id | Primary key of containing meter result | |
metering_modelingperiodgroup | Table describing a pair of modeling periods (baseline + reporting) | |
id | Primary key | |
baseline_id | Primary key of baseline modeling period | |
meter_result_id | Primary key of containing meter result | |
reporting_id | Primary key of reporting modeling period | |
metering_modelresult | Table storing results from modeling | |
id | Primary key | |
status | SUCCESS/FAILURE | |
traceback | Traceback, if any | |
start_date | Start date of data used in modeling | |
end_date | End date of data used in modeling | |
n_rows | number of rows supplied as input to modeling | |
r2 | R-squared model fit | |
cvrmse | Coefficient of variation of root mean squared error (rmse normalized by mean) | |
rmse | root mean squared error | |
lower | Value to be subtracted from any individual predicted point to obtain lower bound on 95% confidence interval | |
upper | Value to be added to aby individual predicted point to obtain upper bound on 95% confidence interval | |
added | Date added | |
updated | Date updated | |
meter_result_id | Primary key of meter result | |
modeling_period_id | Primary key of modeling period | |
trace_id | Primary key of trace |
Metering tables¶
Name of Table | Name of Row | Description of Row |
---|---|---|
metering_aggregationrun | Aggregation task | |
id | Primary key | |
aggregation_input | Serialized aggregation input | |
status | PENDING/RUNNING/SUCCESS/FAILURE | |
failure_message | Failure message, if any | |
traceback | Traceback text, if error occured | |
trace_interpretation | Type of trace in this aggregation | |
derivative_interpretation | Type of derivative in this aggregation | |
aggregation_interpretation | Type of aggregation to be performed | |
group_id | Foreign key to datastore_tracegroup table | |
added | Date added | |
updated | Date updated | |
metering_aggregationresult | Aggregation task result | |
id | Primary key | |
aggregation_input | Serialized aggregation output | |
trace_interpretation | Type of trace in this aggregation | |
derivative_interpretation | Type of derivative in this aggregation | |
aggregation_interpretation | Type of aggregation to be performed | |
eemeter_version | Version of eemeter library used to calculate this result | |
datastore_version | Version of datastore application used to calculate this result | |
unit | Unit of measure | |
baseline_value | Modeled counterfactual baseline value | |
baseline_lower | Amount to be subtracted from baseline_value to obtain lower bound on 95% confidence interval | |
baseline_upper | Amount to be added to baseline_value to obtain upper bound on 95% confidence interval | |
baseline_n | Number of points in combined baseline demand fixtures | |
reporting_value | Modeled counterfactual reporting value | |
reporting_lower | Amount to be subtracted from reporting_value to obtain lower bound on 95% confidence interval | |
reporting_upper | Amount to be added to reporting_value to obtain upper bound on 95% confidence interval | |
reporting_n | Number of points in combined reporting demand fixtures | |
differential_direction | BASELINE_MINUS_REPORTING/REPORTING_MINUS_BASELINE | |
differential_value | Modeled counterfactual differential value | |
differential_lower | Amount to be subtracted from differential_value to obtain lower bound on 95% confidence interval | |
differential_upper | Amount to be added to differential_value to obtain upper bound on 95% confidence interval | |
differential_n | Number of points in combined differential demand fixture | |
added | Date added | |
updated | Date updated | |
aggregation_run_id | Foreign key to metering_aggregationrun table | |
metering_aggregationderivativestatus | Status of inclusion in aggregation | |
id | Primary key | |
status | ACCEPTED/REJECTED | |
baseline_status | Baseline result ACCEPTED or REJECTED | |
reporting_status | Reporting result ACCEPTED or REJECTED | |
aggregation_result_id | Foreign key to metering_aggregationresult table | |
derivative_id | Foreign key to metering_meterderivative table |
Warehouse tables¶
Name of Table | Name of Row | Description of Row |
---|---|---|
warehouse_meterresultmart | Summarized meter results | |
id | Primary key | |
trace_id | Trace identifing string | |
trace_pk | Primary key of trace | |
trace_interpretation | Type of trace | |
trace_unit | Unit of measure of trace | |
project_id | Project identifying string | |
project_pk | Primary key of project | |
serialized_input_url | Cloud storage location of serialized input | |
serialized_output_url | Cloud storage location of serialized output | |
meter_result_pk | Primary key of meter result | |
meter_result_status | Meter result status | |
meter_result_eemeter_version | eemeter library software version | |
meter_result_datastore_version | datastore library software version | |
meter_result_model_class | Model class used in model fitting | |
meter_result_model_kwargs | Keyword arguments used in model class initialization | |
meter_result_formatter_class | Formatter class used in model data formatting | |
meter_result_formatter_kwargs | Keyword arguments used in formatter class initialization | |
meter_result_added | Date meter result added | |
meter_result_updated | Date meter result updated | |
meter_run_pk | Primary key of meter run | |
meter_run_status | Meter run status | |
meter_run_failure_message | Failure message (if any) | |
meter_run_traceback | Traceback (if any) | |
meter_run_added | Date meter run added | |
meter_run_updated | Date meter result added | |
modeling_period_group_pk | Primary key of modeling period | |
derivative_pk | Primary key of derivative | |
derivative_interpretation | Type of derivative | |
derivative_unit | Unit of measure of derivative | |
baseline_period_pk | Primary key of baseline period | |
baseline_period_label | Label of baseline period | |
baseline_period_start | Start date of baseline period (if any) | |
baseline_period_end | End date of baseline period | |
baseline_model_result_pk | Primary key of baseline model result | |
baseline_model_result_status | Status of baseline model result | |
baseline_model_result_traceback | Traceback if failed | |
baseline_model_result_r2 | R squared | |
baseline_model_result_cvrmse | Coefficient of variation of root mean squared error | |
baseline_model_result_n_rows | Number of rows in input | |
baseline_model_result_rmse | Root mean squared error | |
baseline_derivative_value | Baseline derivative value | |
baseline_derivative_lower_bound | 95 percent confidence lower bound on baseline derivative value | |
baseline_derivative_upper_bound | 95 percent confidence upper bound on baseline derivative value | |
reporting_period_pk | Primary key of reporting period | |
reporting_period_label | Label of reporting period | |
reporting_period_start | Start date of reporting period (if any) | |
reporting_period_end | End date of reporting period | |
reporting_model_result_pk | Primary key of reporting model result | |
reporting_model_result_status | Status of reporting model result | |
reporting_model_result_traceback | Traceback if failed | |
reporting_model_result_r2 | R squared | |
reporting_model_result_cvrmse | Coefficient of variation of root mean squared error | |
reporting_model_result_n_rows | Number of rows in input | |
reporting_model_result_rmse | Root mean squared error | |
reporting_derivative_value | Reporting derivative value | |
reporting_derivative_lower_bound | 95 percent confidence lower bound on reporting derivative value | |
reporting_derivative_upper_bound | 95 percent confidence upper bound on reporting derivative value | |
differential_value | Savings value | |
differential_direction | BASELINE_MINUS_REPORTING/REPORTING_MINUS_BASELINE | |
differential_lower_bound | 95 percent confidence lower bound on savings value | |
differential_upper_bound | 95 percent confidence upper bound on savings value | |
warehouse_groupstatisticsmart | Summaries group statistics | |
id | Primary key | |
group_name | Name of group | |
group_pk | Primary key of group | |
serialized_input_url | Cloud storage location of serialized input | |
serialized_output_url | Cloud storage location of serialized output | |
aggregation_run_pk | Primary key of aggregation run | |
aggregation_run_status | Status of aggregation run | |
aggregation_run_failure_message | Failure message (if any) | |
aggregation_run_traceback | Traceback (if any) | |
aggregation_run_added | Date added | |
aggregation_run_updated | Date updated | |
aggregation_result_pk | Primary key of aggregation result | |
n_derivatives | Number of derivatives in group | |
aggregation_result_added | Date added | |
aggregation_result_updated | Date updated | |
aggregation_result_eemeter_version | eemeter library software version | |
aggregetion_result_datastore_version | datastore application software version | |
trace_interpretation | Type of trace included in aggreation | |
derivative_interpretation | Type of derivative included in aggregation | |
statistic_interpretation | Type of aggregation done | |
statistic_unit | Unit of measure | |
baseline_value | Aggregated baseline value | |
baseline_lower_bound | 95 percent confidence lower bound | |
baseline_upper_bound | 95 percent confidence upper bound | |
reporting_value | Aggregated reporting value | |
reporting_lower_bound | 95 percent confidence lower bound | |
reporting_upper_bound | 95 percent confidence upper bound | |
differential_value | Aggregated differential value | |
differential_direction | BASELINE_MINUS_REPORTING/REPORTING_MINUS_BASELINE | |
differential_lower_bound | 95 percent confidence lower bound | |
differential_upper_bound | 95 percent confidence upper bound | |
n_derivatives_accepted | Number of derivatives in group accepted | |
n_derivatives_accepted_baseline | Number of derivatives in group with accepted baseline result | |
n_derivatives_accepted_reporting | Number of derivatives in group with accepted reporting result | |
n_derivatives_rejected | Number of derivatives in group rejected | |
n_derivatives_rejected_baseline | Number of derivatives in group with rejected baseline result | |
n_derivatives_rejected_reporting | Number of derivatives in group with rejected reporting result |
Management commands¶
The following management commands are available for usage on the datastore.
dev_seed¶
Creates an admin user:
- username:
demo
- password:
demo-password
- access token:
tokstr
Creates a sample project with the id DEV_SEED_PROJECT
with the
following traces:
DEV_SEED_TRACE_NATURAL_GAS_MONTHLY
DEV_SEED_TRACE_NATURAL_GAS_DAILY
DEV_SEED_TRACE_ELECTRICITY_15MIN
DEV_SEED_TRACE_ELECTRICITY_HOURLY
DEV_SEED_TRACE_SOLAR_HOURLY
DEV_SEED_TRACE_SOLAR_30MIN
Example usage:
python manage.py dev_seed
prod_seed¶
Creates an admin user with generated password and access token:
- username:
admin
- password: <generated password>
- access token: <generated token>
The generated password and access token will be shown in the output:
Admin password: <generated password>
Admin token: <generated token>
Example usage:
python manage.py prod_seed
trace_record_indexes¶
Creates and destroy indexes as part of loading TraceRecords.
Loading raw data is significantly faster if indexes and foreign key constraints are dropped and rebuilt after importing.
This command inspects the current indexes and constraints, dropping all but the primary key indexes.
If new indexes are added, they should be added here (not in model classes) so that they are properly rebuilt during imports.
The results of this command can be inspected through psql:
=> \d datastore_tracerecord
With indexes, the description will look something like this:
Indexes:
"datastore_tracerecord_pkey" PRIMARY KEY, btree (id)
"datastore_tracerecord_ffe73c23" btree (trace_id)
Foreign-key constraints:
"datast_trace_id_53e4466e_fk_datastore_trace_id"
FOREIGN KEY (trace_id) REFERENCES datastore_trace(id)
DEFERRABLE INITIALLY DEFERRED
Without indexes, it will look something like this:
Indexes:
"datastore_tracerecord_pkey" PRIMARY KEY, btree (id)
Example usage:
To destroy trace_records (before ETL):
python manage.py trace_record_indexes destroy
To create trace_records (after ETL):
python manage.py trace_record_indexes create
run_meters¶
Triggers meter runs for specified projects or traces.
Example usage:
python manage.py run_meters --all-traces
Optional arguments:
--projects PROJECTS [PROJECTS ...]
Project ids to run
--traces TRACES [TRACES ...]
Trace ids to run
--all-projects Run meters for all projects, overrides --projects
--all-traces Run meters for all traces, overrides --traces
--use-project-id Use project_id, not id, for any projects to run
--use-trace-id Use trace_id, not id, for any traces to run
--purge-queue Purges celery queue before adding meter runs
--detailed-output Provides more detailed project and trace level output
re: meter ids
--delete-previous-meters
Delete old meter runs associated with these ids
meter_progress¶
Check progress of one or more meter runs.
Example usage:
python manage.py meter_progress --all-meters
Optional arguments:
--meters METERS [METERS ...]
Meter ids to check
--all-meters Check progress for all meters
--poll-until-complete
Repeatedly check progress until all meters complete
--poll-interval POLL_INTERVAL
Seconds to wait between checks if --poll-until-
complete
--poll-max POLL_MAX Max number of seconds to poll if --poll-until-complete
before exiting
delete_meters¶
Delete meter runs.
Example usage:
python manage.py delete_meters
Optional arguments:
--meters METERS [METERS ...]
Meter ids to delete
--traces TRACES [TRACES ...]
Trace ids to delete associated meters
--projects PROJECTS [PROJECTS ...]
Project ids to delete associated meters
run_aggregations¶
Run aggregations of meter results by group.
Example usage:
python manage.py run_aggregations --all-groups
Optional arguments:
--group-names GROUP_NAMES [GROUP_NAMES ...]
Groups against which to run aggregations
--all-groups Run aggregations for all groups; overrides
--group_names
meterresultmart¶
Create and destroy the data warehouse mart for meter results.
The warehouse table is warehouse_meterresultmart
Example usage:
python manage.py meterresultmart create
python manage.py meterresultmart destroy
modelresultmart¶
Create and destroy the data warehouse mart for model results.
The warehouse table is warehouse_modelresultmart
Example usage:
python manage.py modelresultmart create
python manage.py modelresultmart destroy
projectsummarymart¶
Create and destroy a data mart for metering results organized by project for a charting frontend.
The warehouse table is warehouse_projectsummarymart
Example usage:
python manage.py projectsummarymart create
python manage.py projectsummarymart destroy
tracesummarymart¶
Create and destroy a data mart that summarizes traces and their records.
The warehouse table is warehouse_tracesummarymart
Example usage:
python manage.py tracesummarymart create
python manage.py tracesummarymart destroy
geoinfo¶
Create and destroy two tables for geographical information
The warehouse tables are warehouse_zctainfo and warehouse_countyinfo
Example usage:
python manage.py geoinfo create
python manage.py geoinfo destroy
API¶
ETL Toolkit¶
The ETL toolkit is provided to assist moving data from its source into the datastore.
“ETL” stands for Extract-Transform-Load. These three steps outline the actions the ETL toolkit helps with and are as follows:
- Extract: obtain data from an external (non-datastore) source.
- Transform: convert that data into a form usable the datatore.
- Load: move the transformed data into the datastore.
The ETL library is not run directly. Rather, its components are used to build ETL pipelines that are specific to a datastore instance.
API¶
…