PrettyPandas

PrettyPandas is an extension to the Pandas DataFrame class that helps you create report qualitiy tables with a simple API.

(
    df
    .pipe(PrettyPandas)
    .as_currency('GBP', subset='A')
    .as_percent(subset='B')
    .total()
    .average()
)
_images/API@2x.png

Features

  • Add summary rows and columns.
  • Number formatting for currency, scientific units, and percentages.
  • Chaining commands.
  • Works seamlessly with Pandas Style API.

Note

Version 0.0.4 removes the apply_pretty_globals function and other custom CSS properties because Pandas and Jupyter now defaults to providing great looking html tables. If you still want custom CSS you can use the Pandas Style API.

Installation

You can install PrettyPandas using pip with support for Python 2.7, 3.3, 3.4, and 3.5:

pip install prettypandas

You can also install from source:

git clone git@github.com:HHammond/PrettyPandas.git
cd PrettyPandas
python setup.py install

Contributing

The project is available on GitHub and anyone is welcome to contribute. You can use the issue tracker to report issues, bugs, or suggest improvements.

Contents

Quick Start

Adding Summaries

PrettyPandas supports many built in summary functions, as well as providing the ability to create your own summaries. Summary functions can be applied over a DataFrame’s rows or columns, or both.

The builtin summary methods are:

If you wanted to add a grand total to the bottom of your table the code is simple:

PrettyPandas(df).total()
_images/total@2x.png

Or additionally if you want to use Pandas fluent API:

df.pipe(PrettyPandas).total()

PrettyPandas follows a fluent API so you can chain multiple summaries easily:

df.pipe(PrettyPandas).total().average()
_images/average@2x.png

The axis parameter specifies which numpy style axis to apply a summary on — 0 for columns, 1 for rows, and None for both.

PrettyPandas(df).total(axis=1)
_images/alt_axis@2x.png

You can even mix and match summaries applied to different axis.

Creating a Custom Summary

The summary method creates a custom summary from a function which takes an array-like structure as a list.

def count_greater_than_zero(column):
    return (column > 0).sum()

PrettyPandas(df).summary(count_greater_than_zero, title="> 0")
_images/custom_fn@2x.png

Converting Back to Pandas DataFrame

.to_frame()

After adding summary rows or columns you can get a DataFrame with your changes applied by calling the ._to_frame.

For example the following code would add a total to your DataFrame and return it back to a Pandas native DataFrame.

(
    df
    .pipe(PrettyPandas)
    .total(axis=1)
    .to_frame()
)
.style

The .style property allows you to drop right into the Pandas Style API. This code would allow you to compute a summary, format the table using percentages, and apply a backgrouned gradient to a table:

(
    df.pipe(PrettyPandas)
    .as_percent(precision=0)
    .median()
    .style
    .background_gradient()
)

Formatting Numbers

Most reports use at least some units of measurement. PrettyPandas currently supports percentages, money, and a more general unit method.

The as_unit method takes a positional unit argument which indicates the string representing the unit to be used and a location argument to specify whether the unit should be a prefix or suffix to the value.

The as_currency and as_percent methods are localized to use whatever units your Python distribution thinks are best for you. If you aren’t getting the correct units use the set_locale method to specify your locale.

If you need to use a different currency, just pass it to currency='...' to change it.

The as_money method takes optional currency and location arguments which work just like the as_unit method. By default the currency is in dollars.

Note

Python 2 doesn’t support unicode literals by default. You can use unicode literals (e.g. u'€') or import the unicode literal behaviour from Python 3:

from __future__ import unicode_literals
Formatting Columns

By default the formatting methods apply to the entire dataframe. When you need to format just a few columns you can use the subset argument to specify a single column, or multiple columns.

PrettyPandas(df).as_percent(subset='A')  # Format just column A
_images/format_a@2x.png
PrettyPandas(df).as_percent(subset=['A', 'B'])  # Format columns A and B
_images/format_a_b@2x.png
Formatting Rows and Complex Formatting

Formatting rows is more complicated than formatting columns. The subset argument needs to take in a pandas.Index to specify the row.

# Format the row with row-index 3
PrettyPandas(df).as_percent(subset=pd.IndexSlice[3,:], precision=2)
_images/format_row@2x.png

For multi-index dataframes subsetting is more complicated. You will need to use multiple pandas.IndexSlice objects to get the correct rows.

The following example shows how to select rows in a multi-index:

first_row_idx = pd.IndexSlice[0, :]
second_row_idx = pd.IndexSlice[1, :]

(
    df.pipe(PrettyPandas)
    .as_currency(subset=first_row_idx)
    .as_percent(subset=second_row_idx)
    .total(axis=1)
)
_images/format_complex@2x.png

For more info on Pandas indexing, read Pandas Indexing and Pandas Advanced Indexing.

Testing

Tests use pytest for testing. After downloading the repository from GitHub run the following:

py.test test

prettypandas package

class prettypandas.PrettyPandas(data, summary_rows=None, summary_cols=None, formatters=None, *args, **kwargs)

Bases: object

Parameters:
  • data – DataFrame.
  • summary_rows – list of Aggregate objects to be appended as a summary.
  • summary_cols – list of Aggregate objects to be appended as a summary.
  • formatters – List of Formatter objects to format.
as_currency(currency=u'USD', locale=Locale('en_US'), *args, **kwargs)

Format subset as currency

Parameters:
  • currency – Currency
  • locale – Babel locale for currency formatting
  • subset – Pandas subset
as_percent(precision=2, *args, **kwargs)

Format subset as percentages

Parameters:
  • precision – Decimal precision
  • subset – Pandas subset
as_unit(unit, location=u'suffix', *args, **kwargs)

Format subset as with units

Parameters:
  • unit – string to use as unit
  • location – prefix or suffix
  • subset – Pandas subset
average(title=u'Average', **kwargs)

Add a mean summary to this table.

Parameters:title – Title to be displayed.
frame

Add summaries and convert back to DataFrame

max(title=u'Maximum', **kwargs)

Add a maximum summary to this table.

Parameters:title – Title to be displayed.
median(title=u'Median', **kwargs)

Add a median summary to this table.

Parameters:title – Title to be displayed.
min(title=u'Minimum', **kwargs)

Add a minimum summary to this table.

Parameters:title – Title to be displayed.
multi_summary(funcs, titles, axis=0, *args, **kwargs)
render()
style

Add summaries and convert to Pandas Styler

summary(func=<operator.methodcaller object>, title=u'Total', axis=0, subset=None, *args, **kwargs)

Add multiple summary rows or columns to the dataframe.

Parameters:
  • func – function to be used for a summary.
  • titles – Title for this summary column.
  • axis – Same as numpy and pandas axis argument. A value of None will cause the summary to be applied to both rows and columns.
  • args – Positional arguments passed to all the functions.
  • kwargs – Keyword arguments passed to all the functions.

The results of summary can be chained together.

to_frame()

Add summaries and convert back to DataFrame

total(title=u'Total', **kwargs)

Add a total summary to this table.

Parameters:title – Title to be displayed.
prettypandas.as_currency(currency='USD', locale=Locale('en_US'))
prettypandas.as_percent(precision=2, **kwargs)

Convert number to percentage string.

Parameters:
  • v – numerical value to be converted
  • precision – int decimal places to round to
prettypandas.as_unit(unit, precision=2, location='suffix')

Convert value to unit.

Parameters:
  • v – numerical value
  • unit – string of unit
  • precision – int decimal places to round to
  • location – ‘prefix’ or ‘suffix’ representing where the currency symbol falls relative to the value

Submodules

prettypandas.summarize module
class prettypandas.summarizer.Aggregate(title, func, subset=None, axis=0, *args, **kwargs)

Bases: object

Aggreagte

Wrapper to calculate aggregate row on datafame.

Parameters:
  • title – Aggregate row title
  • func – Function to be passed to DataFrame.agg
  • subset – Subset of DataFrame to compute aggregate on
  • axis – Pandas axis to compute over
  • args – Positionsal arguments to DataFrame.agg
  • kwargs – Keyword arguments to DataFrame.agg
apply(df)

Compute aggregate over DataFrame

class prettypandas.summarizer.Formatter(formatter, args, kwargs)

Bases: object

Wrapper to apply formatting to datafame.

Parameters:
  • formatter – Function to be passed to Pandas Styler.format
  • args – Positionsal arguments to Styler.format
  • kwargs – Keyword arguments to Styler.format
apply(styler)

Apply Summary over Pandas Styler

class prettypandas.summarizer.PrettyPandas(data, summary_rows=None, summary_cols=None, formatters=None, *args, **kwargs)

Bases: object

Parameters:
  • data – DataFrame.
  • summary_rows – list of Aggregate objects to be appended as a summary.
  • summary_cols – list of Aggregate objects to be appended as a summary.
  • formatters – List of Formatter objects to format.
as_currency(currency=u'USD', locale=Locale('en_US'), *args, **kwargs)

Format subset as currency

Parameters:
  • currency – Currency
  • locale – Babel locale for currency formatting
  • subset – Pandas subset
as_percent(precision=2, *args, **kwargs)

Format subset as percentages

Parameters:
  • precision – Decimal precision
  • subset – Pandas subset
as_unit(unit, location=u'suffix', *args, **kwargs)

Format subset as with units

Parameters:
  • unit – string to use as unit
  • location – prefix or suffix
  • subset – Pandas subset
average(title=u'Average', **kwargs)

Add a mean summary to this table.

Parameters:title – Title to be displayed.
frame

Add summaries and convert back to DataFrame

max(title=u'Maximum', **kwargs)

Add a maximum summary to this table.

Parameters:title – Title to be displayed.
median(title=u'Median', **kwargs)

Add a median summary to this table.

Parameters:title – Title to be displayed.
min(title=u'Minimum', **kwargs)

Add a minimum summary to this table.

Parameters:title – Title to be displayed.
multi_summary(funcs, titles, axis=0, *args, **kwargs)
render()
style

Add summaries and convert to Pandas Styler

summary(func=<operator.methodcaller object>, title=u'Total', axis=0, subset=None, *args, **kwargs)

Add multiple summary rows or columns to the dataframe.

Parameters:
  • func – function to be used for a summary.
  • titles – Title for this summary column.
  • axis – Same as numpy and pandas axis argument. A value of None will cause the summary to be applied to both rows and columns.
  • args – Positional arguments passed to all the functions.
  • kwargs – Keyword arguments passed to all the functions.

The results of summary can be chained together.

to_frame()

Add summaries and convert back to DataFrame

total(title=u'Total', **kwargs)

Add a total summary to this table.

Parameters:title – Title to be displayed.
prettypandas.formatters module
prettypandas.formatters.as_currency(currency='USD', locale=Locale('en_US'))
prettypandas.formatters.as_percent(precision=2, **kwargs)

Convert number to percentage string.

Parameters:
  • v – numerical value to be converted
  • precision – int decimal places to round to
prettypandas.formatters.as_unit(unit, precision=2, location='suffix')

Convert value to unit.

Parameters:
  • v – numerical value
  • unit – string of unit
  • precision – int decimal places to round to
  • location – ‘prefix’ or ‘suffix’ representing where the currency symbol falls relative to the value

Indices and tables